When Hanna Wallach first started testing machine learning models, the tasks were well-defined and easy to evaluate. Did the model correctly identify the cats in an image? Did it accurately predict the ...