--- description: Rules for data handling and preprocessing scripts in chemistry ML projects, emphasizing robust pipelines and appropriate techniques for chemical data. globs: data_processing/**/*.py --- - Implement robust data loading and preprocessing pipelines. - Use appropriate techniques for handling chemical data (e.g., molecular fingerprints, SMILES strings). - Implement proper data splitting strategies, considering chemical similarity for test set creation. - Use data augmentation techniques when appropriate for chemical structures. - Utilize efficient data structures for chemical representations. - Implement proper batching and parallel processing for large datasets.