Kevin Reuben Lee
|
4a50ad8b31
|
feat: add OCR to Excel data extraction workflow (Phase 1 - Infrastructure)
Add new BMM workflow for automated document processing using Mistral OCR
via OpenRouter API. This workflow extracts structured data from PDFs, Excel
files, and Outlook messages, consolidating results into a master Excel file.
This commit completes Phase 1 (Core Infrastructure) of the implementation
plan outlined in issue #763. Future phases will add the actual processing
tasks (OCR, parsing, Excel writing, etc.).
**New Components:**
- Data Extraction Specialist agent with OCR/parsing persona
- OCR to Excel workflow with 14-step interactive process
- Comprehensive configuration template
- Processing report template
- Validation checklist
- Complete documentation
**Features:**
- Multi-format support (PDF, XLSX, XLS, MSG)
- Confidence-based extraction validation
- Human-AI collaboration design
- Batch processing configuration
- Automatic backup system
- Audit trail and logging
**Files Added:**
- src/modules/bmm/agents/data-extraction.agent.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/workflow.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/config-template.yaml
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/template.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/instructions.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/checklist.md
- src/modules/bmm/workflows/data-extraction/ocr-to-excel/README.md
**Next Steps:**
- Phase 2: Implement OCR & file processing tasks
- Phase 3: Implement data parsing & validation tasks
- Phase 4: Implement Excel integration tasks
- Phase 5: Implement batch processing & cleanup tasks
- Phase 6: Add tests and finalize documentation
Related to #763
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-10-18 18:13:11 +08:00 |