Abstract
Determining the pKa values of various C–H sites in organic molecules offers valuable insights for synthetic chemists in predicting reaction sites. As molecular complexity increases, this task becomes more challenging. This paper introduces pKalculator, a quantum chemistry (QM)-based workflow for automatic computations of C–H pKa values, which is used to generate a training dataset for a machine learning (ML) model. The QM workflow is benchmarked against 695 experimentally determined C–H pKa values in DMSO. The ML model is trained on a diverse dataset of 775 molecules with 3910 C–H sites. Our ML model predicts C–H pKa values with a mean absolute error (MAE) and a root mean squared error (RMSE) of 1.24 and 2.15 pKa units, respectively. Furthermore, we employ our model on 1043 pKa-dependent reactions (aldol, Claisen, and Michael) and successfully indicate the reaction sites with a Matthew’s correlation coefficient (MCC) of 0.82.
| Original language | English |
|---|---|
| Journal | Beilstein Journal of Organic Chemistry |
| Volume | 20 |
| Pages (from-to) | 1614-1622 |
| Number of pages | 9 |
| ISSN | 2195-951X |
| DOIs | |
| Publication status | Published - 2024 |
Bibliographical note
Funding Information:This work was funded by the Independent Research Foundation Denmark (DFF; grant number 1032-00129B).
Publisher Copyright:
© 2024 Borup et al.;
Keywords
- pKa predictor
- values values;
Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS