Design Challenges
High-friction product and design problems we need to solve. These are the things that will make or break adoption, independent of whether the tech works.
Conversational leverage
Natural language input is flexible (handles format variability), contextual (knows what you're talking about), and low-friction (feels like talking, not data entry). The AI can disambiguate, confirm, and prompt, turning passive systems into active collaborators.
Platform leverage
One unified data model means no sync problems and no translation layers. Cross-product triggers automate what would otherwise require manual discipline. Data captured once flows everywhere it's needed. The schema is designed for the learning loop.
1. Closing the loop
HypothesisProjects don't end cleanly. They fade out, scope creeps, clients go quiet. How does the system know a project is 'done' enough to trigger learning? Manual button? Inferred from activity? This is where most tools fail. The retro never happens.
Don't wait for 'done.' Capture continuously and synthesize on-demand. The AI can prompt for lightweight check-ins during the project ('How did that phase go?') rather than waiting for a big-bang retro at the end. When activity stops for N days, nudge for closure, but the learning has already been happening.
Check-ins feel lighter than forms. 'Hey, you wrapped the design phase last week. Quick gut check: how close was the estimate?' gets better response rates than a retro form. The AI infers closure signals from context ('sounds like you're wrapping up') and prompts accordingly.
Cross-product triggers. Relay sees client activity stop → nudges Retro. The system already has estimated vs actual hours because Blueprint and time data live in the same place. No manual reconciliation needed.
2. What counts as 'similar'?
UncertainFor 'projects like this usually take X' to mean anything, you need to define similarity. Is it industry? Project type? Team size? Client sophistication? Deliverable format? This is a hard taxonomy problem, and if you get it wrong, the recommendations feel random.
Still working through this. Options: (a) let users define their own project types and learn within those buckets, (b) use embedding similarity on project descriptions rather than rigid taxonomy, (c) start with a basic taxonomy and let the AI surface when projects feel miscategorized. Probably some combination.
Unified schema helps. Every project has the same structure (type, phases, skills, client industry), so comparison is apples-to-apples. Embeddings on project descriptions + conversation history can find similarity without rigid taxonomy.
3. Multi-stakeholder handoffs
HypothesisSales scopes the project. PM staffs it. Delivery runs it. Finance closes it. Different people touch different phases, but continuity of data is what makes the learning loop work. Multi-stakeholder handoffs are a design challenge.
The AI is the connective tissue. Each person talks to the same system, which maintains context across handoffs. When the PM picks up a project, the AI already knows what sales discussed. The conversation history *is* the handoff document. No separate 'handoff meeting' or 'transition doc.' Just pick up where the last person left off.
Instead of 'read this doc before taking over,' it's 'ask the AI what you need to know.' The AI summarizes context for a new stakeholder, flags things they should know, and answers questions about decisions made earlier. Natural language is the interface between roles.
Single source of truth. Sales, PM, delivery all read/write to the same Project entity. No sync problems, no 'which version is current?' The data *is* the handoff. No translation layer needed.
4. Trust calibration
HypothesisWhen should a user trust the AI estimate vs. their gut? Too confident too early = bad recommendations. Too hedgy = 'why am I using this?' You need to show confidence intervals and build trust over time with calibration feedback.
Show uncertainty explicitly. 'Based on 3 similar projects, this is 40-60 hours. But you've only closed 3 projects. Take this with a grain of salt.' As more data accumulates, confidence tightens. Also: let users override and track how often their overrides were right. Surfaces when the model is better than gut and vice versa.
Built-in feedback loop. EstimationModel already tracks confidence_level and sample_size. We can show 'based on N projects' and track override accuracy over time. The learning layer is designed for this from day 1.
5. Estimation format variability
HypothesisEvery firm scopes differently. T-shirt sizes. Hours. Story points. Phases. Line items. For learning to work, you need to normalize, but normalization loses nuance. How much structure do you impose?
Meet them in their format, normalize under the hood. The AI can accept 'this is a medium project, maybe 3 weeks' and internally map that to hour ranges based on their historical 'medium' projects. Over time, surface the translation. 'You said medium. Your mediums average 120 hours. Sound right?' Build shared vocabulary without forcing structure upfront.
Conversational input is inherently flexible. Users don't fill a form with predefined fields. They describe the project naturally. The AI extracts structure from prose. 'It's a redesign, probably 6-8 weeks, need a senior dev and a designer' becomes a structured scope without the user ever seeing a form.
Normalize internally, flex externally. Internal model uses hours; display layer can show t-shirt sizes or weeks. Per-org EstimationModel learns what YOUR 'medium' means in hours.
6. Passive capture signal vs. noise
HypothesisIf you're syncing Slack and calendars, how do you know what's project-related? A standup mention vs. a scope change vs. idle chatter. The AI has to attribute activity to projects accurately or the data is garbage.
Use conversation to disambiguate. When the AI sees activity it can't confidently attribute, it asks. 'Saw a 2-hour meeting with Acme yesterday. Was that for the website project or something else?' Low-friction confirmation is better than fully automated misattribution. Over time, the AI learns patterns ('meetings with this client are usually this project').
Turns passive capture into active confirmation. Instead of silently ingesting (and misattributing) data, the AI surfaces uncertainty and asks for help. Builds trust. Users know it won't pollute their data with guesses.
Project as anchor. Everything ties to a Project entity, so attribution has a target. CommunicationLog already has AI summary + scope_drift_flag, designed to extract signal from noise at the schema level.
7. Retro depth vs. friction
Response'Went fine' teaches nothing. But deep retros are time-consuming. How do you get enough signal without making it feel like homework? Prompted questions? AI-generated draft retros from project artifacts?
AI drafts the retro, user confirms or adjusts. The system already knows: estimated hours, actual hours (from time tracking or conversation), scope changes (from Relay), and deliverables. It can say: 'Looks like design took 30% longer than scoped. Was that because the client added rounds, or did we underestimate?' User just confirms, adds color, done.
Instead of 'fill out this retro form,' it's a 3-minute chat: 'Project closed. Quick debrief?' The AI asks pointed questions based on what it already knows. User isn't starting from blank. They're reacting to a draft.
Data already exists. System knows estimated hours, actual hours, scope changes, deliverables from other products. Retro is synthesis, not data entry. PhasePerformance captures at the right granularity automatically.
8. Sensitive data comfort
UncertainSOWs contain pricing, margins, client names, sometimes strategy. Firms may be hesitant to put this in a system, especially one that 'learns.' What's the trust architecture?
Need to think through: (a) can we offer on-prem or single-tenant for larger firms? (b) what's the data isolation story for multi-tenant? (c) do we need anonymization options for the learning layer? (d) how do we handle data retention and deletion? This is less about product design and more about infrastructure and policy.
One security model. Single platform = one tenant isolation policy, not 7 tools with 7 different policies. Organization entity is the top-level boundary from day 1. Easier to reason about than stitched-together tools.
What's not on this list (yet)
- →Pricing model. Per-seat? per-project? usage-based? Affects adoption friction.
- →Integration depth. How much do we build vs. sync from existing tools?
- →Mobile experience. Is this desktop-first? Do field updates need mobile?
- →Collaboration model. Real-time? Async? Who sees what?