84 lines
3.0 KiB
Markdown
84 lines
3.0 KiB
Markdown
# Task: Create Data Analysis Plan
|
|
|
|
## Description
|
|
Create a comprehensive data analysis plan for extracting insights, developing machine learning models, and implementing data pipelines to support project objectives.
|
|
|
|
## Input Required
|
|
- Business requirements or problem statement
|
|
- Available data sources and descriptions
|
|
- Expected outcomes or success criteria
|
|
- Technical constraints or limitations
|
|
|
|
## Steps
|
|
|
|
1. **Problem Definition**
|
|
- Clearly articulate the business problem or opportunity
|
|
- Define specific questions to be answered through analysis
|
|
- Establish success metrics and evaluation criteria
|
|
- Identify stakeholders and their requirements
|
|
|
|
2. **Data Assessment**
|
|
- Inventory available data sources
|
|
- Assess data quality, completeness, and accessibility
|
|
- Identify data gaps and acquisition needs
|
|
- Evaluate data privacy and compliance requirements
|
|
- Define data sampling strategy if applicable
|
|
|
|
3. **Exploratory Analysis Planning**
|
|
- Define key variables to explore
|
|
- Plan initial data profiling and visualization
|
|
- Identify potential relationships to investigate
|
|
- Design statistical tests to validate hypotheses
|
|
- Plan for outlier detection and handling
|
|
|
|
4. **Feature Engineering Strategy**
|
|
- Identify potential features to create
|
|
- Plan transformations and encoding methods
|
|
- Define feature selection approach
|
|
- Document dimensionality reduction techniques if needed
|
|
- Plan feature validation methods
|
|
|
|
5. **Model Development Strategy**
|
|
- Select candidate algorithms based on problem type
|
|
- Define training and validation approach
|
|
- Plan hyperparameter tuning methodology
|
|
- Establish model evaluation metrics
|
|
- Design model interpretability approach
|
|
|
|
6. **Data Pipeline Architecture**
|
|
- Design data ingestion processes
|
|
- Plan data transformation and storage
|
|
- Define model training pipeline
|
|
- Design inference pipeline for production
|
|
- Plan for monitoring and retraining
|
|
|
|
7. **Implementation Roadmap**
|
|
- Create phased implementation plan
|
|
- Establish milestones and deliverables
|
|
- Identify required resources and tools
|
|
- Develop timeline aligned with project goals
|
|
- Plan for knowledge transfer and documentation
|
|
|
|
8. **Review and Validation**
|
|
- Validate plan against business objectives
|
|
- Ensure technical feasibility
|
|
- Confirm alignment with project timeline
|
|
- Verify ethical considerations are addressed
|
|
|
|
## Output
|
|
A comprehensive data analysis plan that includes:
|
|
- Problem definition and success criteria
|
|
- Data assessment and preparation strategy
|
|
- Exploratory analysis approach
|
|
- Feature engineering plan
|
|
- Model development methodology
|
|
- Data pipeline architecture
|
|
- Implementation roadmap and timeline
|
|
|
|
## Validation Criteria
|
|
- Plan addresses the business problem completely
|
|
- Data sources and quality issues are thoroughly assessed
|
|
- Modeling approach is appropriate for the problem type
|
|
- Technical implementation is feasible within constraints
|
|
- Ethical considerations are properly addressed
|
|
- Plan can be executed within project timeline |