Until now, there should be no doubt that AI or Machine Learning is disrupting the software development industry by offering much wider functionalities (most of which are far beyond what traditional softwares’ capabilities) and using a completely different approach. It’s noted that code is no longer the backbone of developing a ML or AI software. Instead, preparing high-quality data, managing experiments' executions and evaluations are more critical to the final deliverables. With latest deep learning libraries or frameworks, e.g. Tensorflow or Pytorch, most models can be built with less than 100 lines of code. On one hand, this really amazes people when they see a fancy face recognition software literally built upon a short code snippet. On the other hand, more thoughts were raised on the formulation of appropriate processes to ensure the effectiveness, quality and efficiency of this novel software development approach.
It’s a rather understandable nature for people to apply certain existing knowledge framework to new alternatives. However, after many attempts of exploiting existing ML workflows using the software engineering conventions, it’s realized that either some traditional software engineering concepts are no longer applicable, or some emerging components are not covered. This is not surprising if you look at the following diagram on how ML engineering development lifecycle differs from traditional software engineering. “Data” and “Model” are two new components in ML engineering. Arguably, more components involved in the pipeline implies that more efforts are needed to take care of them.