Using ML with scrutiny
Undoubtedly, AI or ML can fulfil many functions beyond traditional softwares, e.g. image recognition, item recommendation and even task planning. In real-world industries, however, the employment of ML is never as simple and straightforward as what we see in ML textbooks (i.e. users feed in input data and get outputs from ML models). The beauty of ML-model-based systems is that we don’t have to define any rules regarding how to function. Instead, the systems’ behaviors are totally determined by how the models are trained. In other words, training data, ML mode code and training configurations together work like a black box that produces the final model.
Directly exposing ML model (e.g. via rest APIs) to users or downstream processes is risky since it might produce confusing and disappointing outputs if the test input data can not be a sample from the empirical distribution of training data.Therefore, data validation* and fallback mechanisms are always added to ML systems.;
(* the scope of data validation in this blog does not cover data schema checking)