BMAD-METHOD/docs/VCS_DETECTION_CONFIDENCE.md

# VCS Workflow Detection Confidence Scoring

## Overview

The VCS auto-detection system uses a confidence-based scoring mechanism to suggest (not decide) the most likely workflow pattern. This document explains how confidence scores are calculated and interpreted.

## Core Principle

**"Detection as a HINT, not a DECISION"**

Even with 100% confidence, we always confirm with the user. Auto-detection saves time but doesn't replace human judgment.

## Confidence Score Calculation

### Score Range

- **0.0 - 1.0** (0% - 100%)
- **Threshold for suggestion: 0.7** (70%)
- Below threshold → marked as "unclear" → trigger clarifying questions

### Workflow Indicators and Weights

#### GitFlow (Maximum Score: 1.0)

| Indicator             | Weight | Detection Method                            |
| --------------------- | ------ | ------------------------------------------- |
| Develop branch exists | 0.3    | Check for `develop` or `development` branch |
| Release branches      | 0.3    | Pattern match `release/*` branches          |
| Hotfix branches       | 0.2    | Pattern match `hotfix/*` branches           |
| Version tags          | 0.2    | Tags matching `v*` pattern                  |

#### GitHub Flow (Maximum Score: 1.0)

| Indicator            | Weight | Detection Method                          |
| -------------------- | ------ | ----------------------------------------- |
| PR/MR merges         | 0.3    | Commit messages with "Merge pull request" |
| Short-lived features | 0.3    | Feature branches < 7 days lifespan        |
| Squash merges        | 0.2    | Commits with `(#\d+)` pattern             |
| No develop branch    | 0.2    | Absence of develop/development branch     |

#### Trunk-Based Development (Maximum Score: 1.0)

| Indicator           | Weight | Detection Method                         |
| ------------------- | ------ | ---------------------------------------- |
| Direct main commits | 0.4    | >50% commits directly to main/master     |
| Very short branches | 0.3    | Branches living < 1 day                  |
| Feature flags       | 0.3    | Commits mentioning feature flags/toggles |

## Confidence Interpretation

### High Confidence (≥ 70%)

```yaml
presentation:
  title: 'Detected workflow: {workflow}'
  confidence: '{score}%'
  action: 'Present with evidence and ask for confirmation'
```

Example:

```
🔍 Detected workflow: **GitFlow** (confidence: 85%)

Evidence:
✓ Found develop branch
✓ Found 3 release branches
✓ Found 5 version tags

Is this correct?
```

### Medium Confidence (40% - 69%)

```yaml
presentation:
  title: 'Possible workflow detected'
  action: 'Show evidence but emphasize uncertainty'
  fallback: 'Offer clarifying questions'
```

### Low Confidence (< 40%)

```yaml
presentation:
  title: 'Could not confidently detect workflow'
  action: 'Skip to clarifying questions or manual selection'
```

## Migration Detection

When patterns differ between time periods:

```yaml
time_windows:
  recent: 'last 30 days'
  historical: '30-90 days ago'

if_different:
  confidence_penalty: -0.2 # Reduce confidence
  action: 'Alert user about possible migration'
```

## Edge Cases and Adjustments

### Monorepo Detection

- Multiple package.json/go.mod files → reduce confidence by 0.1
- Different patterns in subdirectories → mark as "complex"

### Fresh Repository

- Less than 10 commits → automatically mark as "unclear"
- No branches besides main → suggest starting with GitHub Flow

### Polluted History

- Imported/migrated repos → check commit dates for anomalies
- Fork detection → warn about inherited patterns

## Confidence Improvement via Questions

When initial confidence is low, progressive questions can increase confidence:

```yaml
question_weights:
  team_size:
    '1 developer': { trunk_based: +0.3 }
    '2-5 developers': { github_flow: +0.2 }
    '6+ developers': { gitflow: +0.2 }

  release_frequency:
    'Daily': { trunk_based: +0.3 }
    'Weekly': { github_flow: +0.3 }
    'Monthly+': { gitflow: +0.3 }

  version_maintenance:
    'Yes': { gitflow: +0.4 }
    'No': { github_flow: +0.2, trunk_based: +0.2 }
```

## Caching Strategy

```yaml
cache_config:
  validity_period: 7_days

  on_cache_hit:
    if_expired: 'Re-run detection'
    if_valid: 'Ask for confirmation of cached result'

  invalidate_on:
    - Major workflow change detected
    - User explicitly requests re-detection
    - Cache older than 7 days
```

## Implementation Guidelines

### For Agent Developers

1. **Always treat detection as advisory**

   ```python
   if detection.confidence >= 0.7:
       suggest_workflow(detection.workflow)
   else:
       ask_clarifying_questions()
   ```

2. **Present evidence transparently**

   ```python
   for indicator in detection.evidence:
       print(f"✓ {indicator}")
   ```

3. **Allow easy override**
   ```python
   # Always provide escape hatch
   options.append("None of the above")
   ```

### For Users

1. **High confidence doesn't mean certainty** - Always review the suggestion
2. **Evidence matters more than score** - Check if the evidence matches your actual workflow
3. **Migration is normal** - If you're changing workflows, tell BMAD
4. **Custom is OK** - Don't force-fit into standard patterns

## Testing Confidence Scores

Test scenarios and expected confidence ranges:

| Scenario                              | Expected Confidence | Expected Workflow |
| ------------------------------------- | ------------------- | ----------------- |
| Clean GitFlow with all branches       | 90-100%             | GitFlow           |
| GitHub Flow with consistent PR merges | 70-85%              | GitHub Flow       |
| Mixed patterns                        | 30-60%              | Unclear           |
| Fresh repo (<10 commits)              | 0-30%               | Unclear           |
| Trunk-based with feature flags        | 70-90%              | Trunk-based       |

## Future Improvements

1. **Machine Learning Enhancement**
   - Learn from user corrections
   - Adjust weights based on success rate

2. **Extended Pattern Recognition**
   - Detect GitLab Flow
   - Recognize scaled patterns (e.g., Scaled Trunk-Based)

3. **Context-Aware Detection**
   - Consider repository language/framework
   - Account for team size if available

## Conclusion

Confidence scoring enables intelligent suggestions while respecting user autonomy. The goal is to save time for the 80% common cases while gracefully handling the 20% edge cases.

Remember: **The best workflow is the one your team actually follows, not what the detector suggests.**