Kevin Reuben Lee
45c1ce454b
feat: implement OCR to Excel data extraction workflow (Phases 2-6)
...
Implements complete OCR-based document processing workflow as described in
GitHub issue #763 . This builds on the Phase 1 infrastructure commit (4a50ad8 )
by adding all task implementation modules and supporting documentation.
## Task Modules Implemented (9 files):
- task-file-scanner.js: Recursive file discovery with glob patterns, filters
already-processed files, creates prioritized processing queues
- task-ocr-process.js: OpenRouter API integration with Mistral OCR, retry
logic with exponential backoff, batch processing with concurrency control
- task-file-converter.js: File format validation and conversion utilities,
handles PDF (direct), Excel/MSG (placeholders for future implementation)
- task-data-parser.js: Parses OCR text into structured data using field
definitions, type coercion (date, number, currency, string), field
extraction with regex patterns, validation rules
- task-data-validator.js: Placeholder for interactive validation UI,
auto-approves high confidence (≥0.85)
- task-excel-writer.js: Excel file write operations with automatic backup,
atomic writes (placeholder - needs xlsx library integration)
- task-file-mover.js: Moves processed files to done folder, preserves folder
structure
- task-batch-processor.js: Orchestrates complete workflow, integrates all
task modules, end-to-end processing pipeline
- task-processing-reporter.js: Generates processing reports, saves processing
logs as JSON
## Documentation & Examples:
- TROUBLESHOOTING.md: Comprehensive troubleshooting guide covering API key
issues, OCR quality, file processing errors, Excel writing, performance
tuning, debugging tips, and configuration examples for different use cases
- examples/sample-config.yaml: Complete example configuration file showing
all available settings with detailed comments
## ESLint Configuration:
- Added override for src/modules/*/tasks/**/*.js to allow:
- CommonJS patterns (require/module.exports) for task compatibility
- Experimental Node.js fetch API usage
- Unused parameters prefixed with underscore
## Implementation Status:
- Phase 1: Infrastructure ✅ (committed: 4a50ad8 )
- Phase 2: OCR & File Processing ✅
- Phase 3: Data Parsing & Validation ✅
- Phase 4: Excel Integration ✅ (placeholder - needs xlsx library)
- Phase 5: Batch Processing ✅
- Phase 6: Testing & Documentation ⏳ (unit tests pending)
## Next Steps:
- Add npm dependencies (xlsx, pdf-parse, @kenjiuno/msgreader)
- Implement actual Excel library integration
- Create unit tests with Jest
- Create integration tests with mock API
- Test with real-world data from issue #763
Related: #763
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-18 18:38:55 +08:00
Kevin Reuben Lee
4a50ad8b31
feat: add OCR to Excel data extraction workflow (Phase 1 - Infrastructure)
...
Add new BMM workflow for automated document processing using Mistral OCR
via OpenRouter API. This workflow extracts structured data from PDFs, Excel
files, and Outlook messages, consolidating results into a master Excel file.
This commit completes Phase 1 (Core Infrastructure) of the implementation
plan outlined in issue #763 . Future phases will add the actual processing
tasks (OCR, parsing, Excel writing, etc.).
**New Components:**
- Data Extraction Specialist agent with OCR/parsing persona
- OCR to Excel workflow with 14-step interactive process
- Comprehensive configuration template
- Processing report template
- Validation checklist
- Complete documentation
**Features:**
- Multi-format support (PDF, XLSX, XLS, MSG)
- Confidence-based extraction validation
- Human-AI collaboration design
- Batch processing configuration
- Automatic backup system
- Audit trail and logging
**Files Added:**
- src/modules/bmm/agents/data-extraction.agent.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/workflow.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/config-template.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/template.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/instructions.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/checklist.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/README.md
**Next Steps:**
- Phase 2: Implement OCR & file processing tasks
- Phase 3: Implement data parsing & validation tasks
- Phase 4: Implement Excel integration tasks
- Phase 5: Implement batch processing & cleanup tasks
- Phase 6: Add tests and finalize documentation
Related to #763
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-18 18:13:11 +08:00
Brian Madison
4b6f34dff8
date removed from status file, status file renamed
2025-10-13 22:32:35 -05:00
Brian Madison
27586e6a40
context should use relative paths
2025-10-13 21:11:20 -05:00
Brian Madison
5eb410d622
update config re deprecated removed file
2025-10-13 19:29:19 -05:00
Brian Madison
f1965810a6
adv elicitation project updated to hopefully not be skipped as optional anymore. further workflow updates.
2025-10-13 00:33:06 -05:00
Brian Madison
36bf506241
all workflows aware
2025-10-12 22:19:28 -05:00
Brian Madison
88989d5403
master workflow integration
2025-10-12 18:10:23 -05:00
Brian Madison
c3c51945bb
docs update
2025-10-12 16:59:54 -05:00
Brian Madison
79ac3c91fe
central source of trust for workflow status, current, and next story or epic
2025-10-12 16:14:29 -05:00
Brian Madison
e61d58d480
workflow level 0 and 1 aligned with brownfield and quick dev
2025-10-12 15:53:24 -05:00
Brian Madison
ab05cdcdd2
\split analyze workflow
2025-10-12 01:39:24 -05:00
Brian Madison
2b736a8594
brownfield document project workflow added to analyst
2025-10-12 00:49:12 -05:00
Brian Madison
4f16d368ac
minor dev agent updates
2025-10-11 19:45:25 -05:00
PinkyD
d76bcb5586
chore: cleaned up bad architecture file calls, legacy doc references, and case sensitivity issues to remove ambiguity ( #718 )
2025-10-10 09:26:49 -05:00
MeetNexus
5977227efc
fix: Correct path to instructions in bmad-init workflow ( #663 )
...
Co-authored-by: Brian <bmadcode@gmail.com>
2025-10-09 19:07:56 -05:00
PinkyD
b62e169bac
adjusted workflow installed_path to proper bmm workflow folders ( #688 )
2025-10-07 16:07:30 -05:00
Alex Verkhovsky
c9ffe202d5
feat(installer): default project name to directory ( #681 )
2025-10-05 22:12:37 -05:00
Brian Madison
33d893bef2
workflows added to sub items in plan project phase. updated single action checks to be ifs on the action.
2025-10-05 11:32:45 -05:00
Brian Madison
aefe72fd60
gdd updated
2025-10-04 22:52:38 -05:00
Brian Madison
16984c3d92
fix path bug
2025-10-04 21:33:19 -05:00
PinkyD
47658c00d5
Fixed bug with activation-steps.xml injecting wrong path ( #674 )
2025-10-04 21:04:33 -05:00
Brian Madison
c632564849
finish move of brainstorming to the core
2025-10-04 19:33:34 -05:00
Brian Madison
c7d76a3037
agent manifest generation, party mode uses it, and tea persona compression
2025-10-04 19:28:10 -05:00
Brian Madison
bbb37a7a86
brainstorming moved to core workflows part 2
2025-10-04 19:02:29 -05:00
Brian Madison
b6d8823d51
brainstorming moved to core workflows
2025-10-04 19:01:37 -05:00
Brian Madison
e60d5cc42d
removed deprecated src_impact
2025-10-04 18:43:24 -05:00
Brian Madison
3147589d0f
bomb agent updates
2025-10-04 17:35:37 -05:00
Brian Madison
94a2dad104
name and language will now persisten better with most models
2025-10-04 16:12:42 -05:00
Brian Madison
9300ad1d71
subagaents updated with consistent return info and missing frontmatter where it was missing
2025-10-04 08:24:21 -05:00
Brian Madison
a747017520
docs updated and agent standalone builder working now from the main install flow
2025-10-04 01:26:38 -05:00
Brian Madison
5ee4cf535c
BoMB updates
2025-10-04 00:22:59 -05:00
Brian Madison
9e8c7f3503
bundle agents front matter optimized, along with the orchestrators activation instructions;
2025-10-03 21:46:53 -05:00
Brian Madison
5ac18cb55c
agent teams orchesatraion prompt improved
2025-10-03 19:08:34 -05:00
Brian Madison
fd01ad69f8
remove uneeded files
2025-10-03 11:54:32 -05:00
Brian Madison
3f40ef4756
agent updates
2025-10-02 21:45:59 -05:00
Brian Madison
c6704b4b6e
web bundles for team complete
2025-10-01 22:22:40 -05:00
Brian Madison
f077a31aa0
docs updated
2025-10-01 18:29:08 -05:00
PinkyD
5f0a318bdf
feature: Added detailed epics file generation that was missing ( #669 )
2025-10-01 14:01:56 -05:00
Brian Madison
25c3d50673
SubAgents in sub folders. installer improvements. BMM Flow document added
2025-10-01 09:12:21 -05:00
Brian Madison
56e7a61bd3
v6 flow documented and subagent organization
2025-10-01 08:50:16 -05:00
Brian Madison
05a3b4f3f1
hash file change checking integrated
2025-09-30 21:20:13 -05:00
Murat K Ozcan
df0c3e4bae
Port TEA commands into workflows and preload Murat knowledge ( #660 )
...
* Port TEA commands into workflows and preload Murat knowledge
* Broke the giant knowledge dump into curated fragments under src/modules/bmm/testarch/knowledge/
* Broke the giant knowledge dump into curated fragments under src/modules/bmm/testarch/knowledge/
* updated the web bunles for tea, and spot updates for analyst and sm
* Replaced the old TEA brief with an indexed knowledge system: the agent now loads topic-specific
docs from knowledge/ via tea-index.csv, workflows reference those fragments, and risk/level/
priority guidance lives in the new fragment files
---------
Co-authored-by: Murat Ozcan <murat@mac.lan>
2025-09-30 15:19:55 -05:00
Brian Madison
30fb0e67e1
analyst command fix
2025-09-30 01:41:09 -05:00
Brian Madison
e1fac26156
all agent bundles working
2025-09-30 01:38:39 -05:00
Brian Madison
108e4d8eb4
feat: add web activation instructions to bundled agents
...
- Created agent-activation-web.xml with bundled file access instructions
- Updated web-bundler to inject web activation into all agent bundles
- Agents now understand how to access <file> elements instead of filesystem
- Includes workflow execution instructions for bundled environments
- Generated new web bundles with activation blocks
2025-09-30 00:32:20 -05:00
Brian Madison
688a841127
missed a workflow update
2025-09-30 00:24:27 -05:00
Brian Madison
c26220daec
installer and bundler progress
2025-09-30 00:24:27 -05:00
Brian Madison
ae136ceb03
web_bundle info added to workflow yamls
2025-09-30 00:24:27 -05:00
Brian Madison
9934224230
workflows indicate web_bundle file inclusions
2025-09-30 00:24:27 -05:00