Draft scope study. The system described below is in design and planning. Phase 1 build starts the week of April 27, 2026, with a target Phase 1 ship by mid-May. The Agent Architecture case study is the live production system this design generalizes from.
Daniel Lepel | M365 & Azure Architect
Copilot Concierge: A Scope Study
A generalized M365 triage agent for any Outlook user, on Copilot Studio and Azure AI Foundry.
The Agent Architecture case study describes a production system I built and operate against my own job search. Inside that system is one capability worth generalizing: triage and draft generation across Outlook and Teams, with a brand voice that writes like the user. Copilot Concierge is that slice, redesigned for the Microsoft agent stack so any M365 tenant can deploy it. This document is the scope, the architecture, and the build plan. The product does not exist yet. The design is locked.
Planned product
Runtime: Azure AI Foundry · Copilot Studio
Reasoning model: Anthropic Claude via Foundry
Companion to the Agent Architecture case study
Section 1 | Positioning
One slice of my pipeline, generalized to every Outlook user.
The Agent Architecture case study is the production system. Sixteen specialist skills, four-tier architecture, continuous operation since February 6, 2026, and a hundred and forty-five opportunities tracked end to end. The system works because it is built for one operator and one operating problem. That is also why it does not generalize on its own. Most Microsoft customers will not deploy a Claude Agent SDK pipeline and a Python service mesh into their tenant.
Inside that system is one capability that does generalize: inbox and message triage with grounded draft generation. Every knowledge worker who lives in Outlook and Teams faces the same operating problem I solved for myself. Too many messages. Unclear priorities. Drafts that should be ready before they sit down. Action items that need to land somewhere durable instead of pinging a fourth notification surface. The pattern transfers cleanly. The packaging has to change.
Copilot Concierge is that capability rebuilt on the Microsoft agent stack. Azure AI Foundry for the orchestration, Copilot Studio for the surface, Anthropic Claude via Foundry for the reasoning model, Microsoft Graph as the tool layer. Same architectural discipline as my live system. Different runtime, different reach. One operator becomes any operator with an M365 license.
This document is forward-looking. It is the scope study and architecture for a build that starts the week of April 27, 2026. The design is locked enough to share. The build plan is sized against my active interview calendar so it ships without competing for the time the pipeline still needs.
Section 2 | By the design
The shape of the build, in numbers.
The figures below are design commitments, not run metrics. The Agent Architecture case study has the live counts. This section is the target the Copilot Concierge build is sized against.
4
Surfaces unified
Outlook | Teams 1:1 | @mentions | channels
6
Priority categories
Critical to Noise, action-oriented
2
Delivery modes
Auto-reply or draft, per category, user toggled
2
Voice layers
Universal AI-tell detector + per-user profile
4
Build phases
Outlook, Teams, summaries, To-Do
15
Sent Items sampled
Per-user voice profile training set
80-115
Hours to Phase 4
Single-tenant demo, evening cadence
1
Copilot Studio agent
Surface for the Foundry runtime
Hours estimate is for a working single-tenant demo on the operator's own M365 tenant. Multi-tenant commercial-grade packaging adds another two hundred hours and is out of scope for this document.
Section 3 | Capabilities
What it will do, in user terms.
Unified conversation triage
One pipeline reads Outlook mail, Teams one-on-one chats, and Teams @mentions on a single classification model. Channel posts feed a separate read-only summary stream. The user sees one queue of conversations needing attention, not three apps to scan.
Six-category priority classifier
Critical, Urgent, Important, Normal, FYI, Noise. Each category maps to a default action the agent recommends or executes. Categories are tunable per user. The classifier reads sender importance, semantic urgency, thread position, and calendar adjacency. Subject line is one feature among several.
Auto-reply and draft toggle
Per-category switch decides whether the agent stages a draft or sends it. Default is draft-only. Users graduate to auto-reply on the categories they trust first. Confidence threshold per category prevents low-confidence auto-replies even when the toggle is on.
Sender whitelist and blacklist
Override the per-category settings on individual senders or domains. Never auto-reply to anyone in the C-suite list. Always draft only for external senders. The override is per-user and edited from the settings page.
Two-layer brand voice system
Universal AI-tell detector ships as the default and catches the phrases every AI product generates by default. Per-user voice profile layers on top, trained from fifteen recent Sent Items the user approves once. The output reads like the user, not like a chatbot.
Teams channel summaries
Read-only scan across followed channels. Morning digest at a fixed time, plus on-demand via natural-language trigger in M365 Copilot chat. Topic clustering inside each channel so the user sees what was discussed, not a flat reverse-chronological dump.
Microsoft To-Do integration
Action items extracted from Outlook, Teams chats, and @mentions land in the user's To-Do list with a back-link to the source message. Priority maps from the classifier output. Notifications stop competing with another inbox.
Audit log with one-click revoke
Every auto-action writes a row to an audit view. The user can review what the agent did, surface the original message, and revoke or correct any action. Trust scales when the user can see exactly what happened and when.
Section 3.5 | The six categories
Priority labels mapped to default actions.
Action-oriented, not abstract.
Most priority inbox tools use abstract labels: high, medium, low. That maps poorly to what the user actually does next. Copilot Concierge uses six categories, each tied to a specific recommended action. The categories are the surface the user interacts with. The classifier underneath is the work.
Critical
Act now
Explicit deadline today, escalation language, blocking someone else, or named in a meeting starting within two hours.
Default action: surface in the user's home view with audible alert; draft reply staged.
Urgent
Act this week
Deadline this week, repeated follow-ups from sender, or strong priority signals from the sender's role.
Default action: stage at top of the conversations queue; draft reply ready.
Important
Schedule
Substantive request that needs thought, no immediate timing pressure, but worth blocking time for.
Default action: draft reply staged; suggested calendar block on the user's next free slot.
Normal
Handle in flow
Standard business correspondence. Answer when the user gets to it. Most internal traffic lands here.
Default action: draft reply staged; queued behind higher-priority items.
FYI
Read later
Informational. Status updates, project reports, no response expected. Worth knowing about but not worth stopping for.
Default action: marked read; summary line in morning digest; no draft generated.
Noise
Ignore
Newsletters, promotional, automated notifications with no action required. The signal-free tail of the inbox.
Default action: auto-archive to a Noise folder; suppressed from triage queue and digest.
The category labels are tunable. A user who prefers three buckets can collapse Important and Normal into one and rename the result. The default six is the recommended starting point because it maps cleanly to the four actions a knowledge worker actually takes: do now, schedule, read later, ignore.
Section 4 | Architecture
How the pieces fit on the Microsoft stack.
Surface in Studio. Engineering in Foundry. Tools in Graph. State in M365.
Same architectural discipline as the production Agent Architecture system. Probabilistic work stays in the model. Deterministic work stays in code. The two halves never pretend to be each other. The runtime moves from Claude Agent SDK and Python services to Azure AI Foundry and Copilot Studio. The patterns transfer. The tools change to match where Microsoft customers already are.
Tier 1 | Surface
Microsoft 365 Copilot chat & Copilot Studio agent
User interaction lives where the user already is. Triage happens in the background; on-demand questions ("What did I miss in Teams today?") run through M365 Copilot chat. The Copilot Studio agent is the named entry point published to the tenant.
Tier 2 | Orchestrator
Azure AI Foundry agent service, Claude as reasoning model
The real engineering. Python orchestration code, Claude via Foundry as the reasoning engine, agent tools defined in code. Studio invokes Foundry; Foundry runs the classification, drafting, and decision logic. This split is what lets a Principal Architect ship something that looks like a Studio agent at the surface and reads like real engineering underneath.
Tier 3a | Graph API tools
Mail | Teams chat | calendar | To-Do
Microsoft Graph endpoints for every action: list mail messages, paginate Teams chats, read calendar adjacency, write To-Do tasks, stage and send Outlook drafts. OAuth scopes scoped to least privilege. The full mapping is in the technical appendix.
Tier 3b | Deterministic services
Python functions inside the Foundry agent
Anything that must be the same every time: rule evaluation for the auto-reply toggle, sender whitelist matching, confidence threshold gating, audit log writes, voice profile fingerprint matching. Same pattern as the Python service mesh in the live system.
→
Tier 4 | State
M365-native, no external store
Per-user settings in Dataverse (the Power Platform store backing Copilot Studio). Audit log in Application Insights for queryability. Voice profile cached as a JSON document in user OneDrive so the user can see and edit it. No external database, no data exfiltration risk.
Ground truth
User-owned configuration
The user owns every tunable: category labels, auto-reply toggles, sender lists, voice profile, channel subscriptions. The agent reads these on every run and never overrides them silently. Same discipline as the CLAUDE.md file in the live system, applied at user scope.
Deterministic | Code
Probabilistic | Model
Section 4.5 | The flow
A new message, end to end.
Six steps. Arrival, context pull, classify, action decide, draft, stage or send.
The four-tier diagram is the structure. The swim lane below is the runtime. One inbound message flows left to right through the agent. Every step is a discrete tool call or a model invocation. The audit log captures the trace.
One inbound conversation, traced end to end. Steps 1 through 4 always run. Step 5 generates the draft. Step 6 either stages or sends, depending on the per-category toggle plus sender override plus confidence score.
01
Arrival
graph.subscribeToChange
Graph webhook fires on new mail or Teams chat. Foundry agent picks up the event. Trace ID issued.
ToolGraph change notifications
02
Context Pull
graph.getThread + getCalendar
Pull the thread, the sender history, recent calendar adjacency, and the user's VIP and blacklist rules.
ToolGraph mail | calendar | Dataverse
03
Classify
claude.classify
Six-category classifier runs on full thread context. Returns category plus confidence score plus a one-line reason for the audit log.
ToolClaude via Foundry
04
Action Decide
rules.evaluateAction
Deterministic Python: per-category toggle plus sender override plus confidence threshold. Output: stage draft, auto-send, or do nothing.
ToolPython rule engine
05
Draft
claude.draft + voice.lint
Generate the reply against the user's voice profile. Universal AI-tell linter strips banned phrases. Per-user profile shapes tone, sentence length, and sign-off.
ToolClaude via Foundry | voice profile
06
Stage or Send
graph.createDraft or graph.sendMail
Stage the draft in Outlook or Teams, or auto-send if the rule engine cleared it. Write the audit log row regardless. Notify the user via the configured channel.
ToolGraph mail | App Insights
Every named component in one place. User on top, M365 surfaces in the next layer, the Foundry orchestrator in the middle, Graph and Python services as the tool layer, M365-native state on the bottom. Orange outlines are MCP-style connectors. Gold fills are agent capabilities.
Layer 1 | User
Any M365 user with a Copilot license
Layer 2 | Surface
M365 Surfaces
Outlook
Teams
M365 Copilot chat
Microsoft To-Do
Agent Front Door
Copilot Concierge (Studio agent)
Layer 3 | Orchestrator
Runtime
Azure AI Foundry agent service
Anthropic Claude (Foundry)
Agent Capabilities
classify
draft
summarize-channel
extract-action-items
build-voice-profile
Layer 4 | Tool layer
Graph endpoints
/me/messages
/me/chats
/me/calendar
/me/todo/lists
/teams/channels
/subscriptions
Deterministic services
rule_engine.py
voice_lint.py
audit_writer.py
profile_trainer.py
Layer 5 | State (M365-native)
Configuration
Dataverse: settings
Dataverse: VIP / blacklist
Dataverse: channel subscriptions
Audit + voice
Application Insights: audit log
OneDrive: voice_profile.json
Section 4.6 | Settings preview
What the user controls.
Every behavior the agent takes is owned by the user.
The settings page is where the agent earns trust. Every auto-action is a toggle, every override is editable, every category default is tunable. The view below is rendered against placeholder defaults, not a live tenant. The structure and the field model are exactly what ships in Phase 1.
Auto-reply per category
Critical
Surface and stage draft, do not auto-send.
Draft only
Urgent
Stage draft for review.
Draft only
Important
Stage draft, suggest calendar block.
Draft only
Normal
Auto-reply for known patterns (meeting confirmations, simple acknowledgments) over confidence threshold.
Auto on
FYI
Mark read, summarize in digest.
Auto on
Noise
Auto-archive to the Noise folder.
Auto on
Sender overrides
VIP list
Always classify at minimum Urgent. Always draft only, never auto-reply. Currently 4 senders.
Active
Blacklist
Suppress notifications. Auto-archive without classification. Currently 12 senders.
Active
Brand voice
Universal AI-tell filter
Strips known AI giveaways from every draft. Always on; not user-disableable.
Always on
Voice profile
Built from your last 15 sent items. Last refreshed 4 days ago. Edit profile to override inferred tone.
Active
Teams & To-Do
Teams 1:1 + @mentions
Treated as conversations. Same classification and draft pipeline as Outlook.
On
Teams channel digest
Morning summary at 7:00 AM ET across followed channels. Plus on-demand via Copilot chat.
On
Microsoft To-Do
Action items extracted from messages land in your To-Do list with source links.
On
Section 5 | Phased build plan
Four phases to a single-tenant demo.
Sequenced against the active interview calendar.
Each phase is independently shippable. Phase 1 alone is a complete product on the operator's own tenant. Phases 2 through 4 are layered additions, not rewrites. Total estimate is 80 to 115 hours across three to four weeks of evening cadence at 8 to 10 hours per week. Build start is the week of April 27 to allow the current interview calendar (Genesys, C2Q Round 2, NY Creates Round 2) to land first.
Phase 1
01
Outlook triage + drafts
Classification, draft generation, auto-reply toggle, audit log, voice profile, settings page. Single user, single inbox. Ships as a working demo on the operator's tenant.
40 to 60 hours | target ship mid-May
Phase 2
02
Teams 1:1 + @mentions
Same triage pipeline applied to Teams chat. Unified conversations queue across Outlook and Teams. Re-uses every component shipped in Phase 1.
15 to 20 hours | target ship late May
Phase 3
03
Teams channel summaries
Read-only scan of followed channels. Morning digest plus on-demand via M365 Copilot chat. Topic clustering inside each channel.
10 to 15 hours | target ship early June
Phase 4
04
Microsoft To-Do
Action item extraction across Outlook, Teams chats, and @mentions. Auto-staged To-Do tasks with priority mapping and back-links to source messages.
15 to 20 hours | target ship mid-June
If a job offer lands before Phase 1 ships, the project plan flips. Phase 1 becomes an onboarding-week proof-of-concept at the new role. Phases 2 through 4 stay on the post-hire track. The work isn't wasted in either path.
Section 6 | Decisions that shape the build
Five choices made before a single line of code.
01
Foundry runs the orchestration; Studio is the surface.
Azure AI Foundry runs the agent code in Python with Claude as the reasoning model. Copilot Studio publishes the named agent into M365 Copilot chat as the front door. The split keeps real engineering in code where a Principal Architect would put it, and the low-code piece serves only as the surface the user sees.
Why it mattersCustomers asking "is this just a Studio agent" need to hear that the orchestration is real engineering, not a topic tree. The split answers that question architecturally instead of in conversation.
02
Draft-only by default; auto-reply is a per-category opt-in.
Every category ships in draft-only mode at first run. Users graduate to auto-reply on the categories they trust first, with a confidence threshold per category that prevents low-confidence auto-actions even when the toggle is on. Sender whitelist and blacklist override the per-category settings.
Why it mattersTrust scales when the user controls every auto-action and can audit and revoke any of them. The opposite is the fastest way to lose enterprise adoption.
03
Two-layer brand voice: universal filter plus per-user profile.
The universal AI-tell detector ships as the default and catches the phrases every AI product generates by default. Per-user voice profile layers on top, trained from fifteen recent Sent Items. The output reads like the user, not like a chatbot. The profile is editable; the universal filter is not.
Why it mattersEvery AI draft tool today produces the same generic voice. A tool that writes like the user is a category unto itself. The universal filter is the floor; the profile is the ceiling.
04
Teams channels stay read-only.
Channel posts feed the summary stream and the To-Do extraction pipeline. The agent does not draft channel replies and does not auto-post. One-on-one chats and @mentions are treated like email because the blast radius is the same as a private message. Channels are public, and channel auto-reply is a fast way to embarrass someone.
Why it mattersThe hardest part of any agent rollout is bounding the blast radius. Read-only on channels is a clear, defensible boundary that customers understand without needing the architecture explained.
05
Microsoft To-Do is the action sink.
Extracted action items land in Microsoft To-Do with back-links to the source message. Priority maps from the classifier output. The user opens one app to see what they need to do, instead of a fourth notification surface competing with the inbox.
Why it mattersMost agents add notifications. Copilot Concierge subtracts them by routing extracted work into the user's existing task system. The agent makes To-Do better, not the inbox louder.
Section 7 | Translation
From one user to a Microsoft cloud practice.
Same architecture, bigger surface area.
What Copilot Concierge does for one M365 user
What it positions for a Microsoft cloud practice
Unified triage across mail, Teams chats, and @mentions. One classifier, one settings surface, one audit log. Six categories, action-oriented, user-tunable.
Reference architecture for any M365 agent rollout. Tenant assessment agents, license optimization agents, governance agents, security operations agents all share the same shape: paginate a Graph source, classify each record, decide an action, audit the result. Copilot Concierge proves the pattern at user scale. Customer engagements scale the same pattern up.
Per-category auto-reply with confidence threshold and sender overrides. Every auto-action is opt-in, gated, audited, and revocable.
Governance template for agent write actions. The same opt-in, threshold, override, audit pattern is what enterprise legal and IT will require before any production agent gets a write scope. Copilot Concierge ships it on day one.
Two-layer brand voice with universal filter plus per-user profile. Drafts read like the user, not like a chatbot.
Branded agent experience pattern. Customer agents face the same problem at company scale: they cannot all sound like the same generic AI. The two-layer model (corporate voice plus per-user profile) generalizes cleanly to customer-facing agents that need to sound like the brand.
Teams channel summaries with topic clustering. Microsoft To-Do as the action sink. Summaries instead of more notifications. Action items land in the existing task system.
End-to-end M365 fluency on display. Mail, Teams, calendar, and To-Do woven into a single agent demonstrates the breadth Microsoft customers expect from a Principal Architect. The case study answers the "can you build on our stack" question with a working artifact.
The Agent Architecture case study is the production credential: a multi-skill agent system, ten weeks of continuous operation, one operator. Copilot Concierge is the Microsoft-platform companion: same architectural discipline, rebuilt on the stack customers actually deploy, designed to ship on a single tenant in four to six weeks of evening work. Two case studies, two runtimes, one operator who shipped both.
The short version.
The Agent Architecture case study is the live production system on Claude Agent SDK. Copilot Concierge is the Microsoft-native generalization of one slice of that system: triage, draft generation, brand voice, Teams integration, and Microsoft To-Do as the action sink. Build starts the week of April 27, 2026. Phase 1 ships mid-May.
If you're building a Microsoft cloud practice, running a Copilot Studio or Agent 365 customer-zero program, or staffing a Principal Architect role where this pattern is part of the job, the architecture above is the working draft. Happy to walk through any of it live.
Daniel Lepel
Principal Microsoft Cloud Architect
daniel@lepel.us
212-252-9200
Albany, NY Capital District