Design Challenges

High-friction product and design problems we need to solve. These are the things that will make or break adoption, independent of whether the tech works.

5Hypothesis

1Response

2Uncertain

Conversational leverage

Natural language input is flexible (handles format variability), contextual (knows what you're talking about), and low-friction (feels like talking, not data entry). The AI can disambiguate, confirm, and prompt, turning passive systems into active collaborators.

Platform leverage

One unified data model means no sync problems and no translation layers. Cross-product triggers automate what would otherwise require manual discipline. Data captured once flows everywhere it's needed. The schema is designed for the learning loop.

1. Closing the loop

Hypothesis

Projects don't end cleanly. They fade out, scope creeps, clients go quiet. How does the system know a project is 'done' enough to trigger learning? Manual button? Inferred from activity? This is where most tools fail. The retro never happens.

Working hypothesis

Don't wait for 'done.' Capture continuously and synthesize on-demand. The AI can prompt for lightweight check-ins during the project ('How did that phase go?') rather than waiting for a big-bang retro at the end. When activity stops for N days, nudge for closure, but the learning has already been happening.

Conversational leverage

Check-ins feel lighter than forms. 'Hey, you wrapped the design phase last week. Quick gut check: how close was the estimate?' gets better response rates than a retro form. The AI infers closure signals from context ('sounds like you're wrapping up') and prompts accordingly.

Platform leverage

Cross-product triggers. Relay sees client activity stop → nudges Retro. The system already has estimated vs actual hours because Blueprint and time data live in the same place. No manual reconciliation needed.

2. What counts as 'similar'?

Uncertain

For 'projects like this usually take X' to mean anything, you need to define similarity. Is it industry? Project type? Team size? Client sophistication? Deliverable format? This is a hard taxonomy problem, and if you get it wrong, the recommendations feel random.

Open questions

Still working through this. Options: (a) let users define their own project types and learn within those buckets, (b) use embedding similarity on project descriptions rather than rigid taxonomy, (c) start with a basic taxonomy and let the AI surface when projects feel miscategorized. Probably some combination.

Platform leverage

Unified schema helps. Every project has the same structure (type, phases, skills, client industry), so comparison is apples-to-apples. Embeddings on project descriptions + conversation history can find similarity without rigid taxonomy.

3. Multi-stakeholder handoffs

Hypothesis

Sales scopes the project. PM staffs it. Delivery runs it. Finance closes it. Different people touch different phases, but continuity of data is what makes the learning loop work. Multi-stakeholder handoffs are a design challenge.

Working hypothesis

The AI is the connective tissue. Each person talks to the same system, which maintains context across handoffs. When the PM picks up a project, the AI already knows what sales discussed. The conversation history *is* the handoff document. No separate 'handoff meeting' or 'transition doc.' Just pick up where the last person left off.

Conversational leverage

Instead of 'read this doc before taking over,' it's 'ask the AI what you need to know.' The AI summarizes context for a new stakeholder, flags things they should know, and answers questions about decisions made earlier. Natural language is the interface between roles.

Platform leverage

Single source of truth. Sales, PM, delivery all read/write to the same Project entity. No sync problems, no 'which version is current?' The data *is* the handoff. No translation layer needed.

4. Trust calibration

Hypothesis

When should a user trust the AI estimate vs. their gut? Too confident too early = bad recommendations. Too hedgy = 'why am I using this?' You need to show confidence intervals and build trust over time with calibration feedback.

Working hypothesis

Show uncertainty explicitly. 'Based on 3 similar projects, this is 40-60 hours. But you've only closed 3 projects. Take this with a grain of salt.' As more data accumulates, confidence tightens. Also: let users override and track how often their overrides were right. Surfaces when the model is better than gut and vice versa.

Platform leverage

Built-in feedback loop. EstimationModel already tracks confidence_level and sample_size. We can show 'based on N projects' and track override accuracy over time. The learning layer is designed for this from day 1.

5. Estimation format variability

Hypothesis

Every firm scopes differently. T-shirt sizes. Hours. Story points. Phases. Line items. For learning to work, you need to normalize, but normalization loses nuance. How much structure do you impose?

Working hypothesis

Meet them in their format, normalize under the hood. The AI can accept 'this is a medium project, maybe 3 weeks' and internally map that to hour ranges based on their historical 'medium' projects. Over time, surface the translation. 'You said medium. Your mediums average 120 hours. Sound right?' Build shared vocabulary without forcing structure upfront.

Conversational leverage

Conversational input is inherently flexible. Users don't fill a form with predefined fields. They describe the project naturally. The AI extracts structure from prose. 'It's a redesign, probably 6-8 weeks, need a senior dev and a designer' becomes a structured scope without the user ever seeing a form.

Platform leverage

Normalize internally, flex externally. Internal model uses hours; display layer can show t-shirt sizes or weeks. Per-org EstimationModel learns what YOUR 'medium' means in hours.

6. Passive capture signal vs. noise

Hypothesis

If you're syncing Slack and calendars, how do you know what's project-related? A standup mention vs. a scope change vs. idle chatter. The AI has to attribute activity to projects accurately or the data is garbage.

Working hypothesis

Use conversation to disambiguate. When the AI sees activity it can't confidently attribute, it asks. 'Saw a 2-hour meeting with Acme yesterday. Was that for the website project or something else?' Low-friction confirmation is better than fully automated misattribution. Over time, the AI learns patterns ('meetings with this client are usually this project').

Conversational leverage

Turns passive capture into active confirmation. Instead of silently ingesting (and misattributing) data, the AI surfaces uncertainty and asks for help. Builds trust. Users know it won't pollute their data with guesses.

Platform leverage

Project as anchor. Everything ties to a Project entity, so attribution has a target. CommunicationLog already has AI summary + scope_drift_flag, designed to extract signal from noise at the schema level.

7. Retro depth vs. friction

Response

'Went fine' teaches nothing. But deep retros are time-consuming. How do you get enough signal without making it feel like homework? Prompted questions? AI-generated draft retros from project artifacts?

Our approach

AI drafts the retro, user confirms or adjusts. The system already knows: estimated hours, actual hours (from time tracking or conversation), scope changes (from Relay), and deliverables. It can say: 'Looks like design took 30% longer than scoped. Was that because the client added rounds, or did we underestimate?' User just confirms, adds color, done.

Conversational leverage

Instead of 'fill out this retro form,' it's a 3-minute chat: 'Project closed. Quick debrief?' The AI asks pointed questions based on what it already knows. User isn't starting from blank. They're reacting to a draft.

Platform leverage

Data already exists. System knows estimated hours, actual hours, scope changes, deliverables from other products. Retro is synthesis, not data entry. PhasePerformance captures at the right granularity automatically.

8. Sensitive data comfort

Uncertain

SOWs contain pricing, margins, client names, sometimes strategy. Firms may be hesitant to put this in a system, especially one that 'learns.' What's the trust architecture?

Open questions

Need to think through: (a) can we offer on-prem or single-tenant for larger firms? (b) what's the data isolation story for multi-tenant? (c) do we need anonymization options for the learning layer? (d) how do we handle data retention and deletion? This is less about product design and more about infrastructure and policy.

Platform leverage

One security model. Single platform = one tenant isolation policy, not 7 tools with 7 different policies. Organization entity is the top-level boundary from day 1. Easier to reason about than stitched-together tools.

What's not on this list (yet)

→Pricing model. Per-seat? per-project? usage-based? Affects adoption friction.
→Integration depth. How much do we build vs. sync from existing tools?
→Mobile experience. Is this desktop-first? Do field updates need mobile?
→Collaboration model. Real-time? Async? Who sees what?