In recent years, increasingly complex machine learning methods have become state-of-the-art in modelling wind turbine power curves based on operational data. While these methods often exhibit superior performance on test sets, they face criticism due to a perceived lack of transparency and concerns about their robustness in dynamic, non-stationary environments encountered by wind turbines. In this work, we address these issues and present a framework that leverages explainable artificial intelligence methods to gain systematic insights into data-driven power curve models. At its core, we propose a metric to quantify how well a learned model strategy aligns with the underlying physical principles of the problem. This novel tool enables model validation beyond the conventional error metrics in an automated manner. We demonstrate, for instance, its capacity as an indicator for model generalization even when limited data is available. Moreover, it facilitates understanding how decisions made during the machine learning development process, such as data selection, pre-processing, or training parameters, affect learned strategies. As a result, we obtain physically more reasonable models, a prerequisite not only for robustness but also for meaningful insights into turbine operation by domain experts. The latter, we illustrate in the context of wind turbine performance monitoring. In summary, the framework aims to guide researchers and practitioners alike toward a more informed selection and utilization of data-driven wind turbine power curve models.