Problem: Current developments in hardware, software, and information technologies allow various types of machine learning (ML) systems, most of which have to be also integrated with complex environments. On the other hand, this situation has made the analysis, design, and development processes of ML applications even harder. Recent studies reveal important findings of the failure of projects involving such systems. Much has been written about the factors that are related to the technology in use (software, data, algorithm, process, etc.), business or application domains (finance, health, production, etc.). This is very similar to the early days of software engineering (SE) “discipline”, in other words, “SE domain”. It took a considerable time for forming its underlying principles, theoretical and philosophical foundations of SE. In the same direction, it is thought that one important issue for ML is the little or no emphasis given to the domain engineering (DE) of ML itself. DE models and products help to understand the environment in which ML systems are supposed to operate. Therefore, there is a long way for the description of the ML domain, which we also believe that this would be a critical success factor for every ML project. Therefore, this situation forms the main problem area of our study. Method: There are two main approaches that can be adopted for the DE of ML: formal and application-focused respectively. The formal one uses mathematical methods and it views a domain as the universe of discourse. It claims that understanding the subject area (finance, health, transportation, etc.) of a system entails domain engineering activities, and it should come before requirement engineering. The second approach adopts an application-focused point of view. It regards any domain as a set of applications or systems, and it emphasizes the commonality and variability features of these applications in a domain. However, we claim that using only one of these approaches may not be sufficient when regarding the idiosyncratic requirements of the ML knowledge domain. In this paper, therefore, we propose a hybrid, two-phased approach. The first phase produces a conceptual domain model along with its domain theory of ML to form the foundations of the second phase as well. The second phase, thus, gives us the structural models, data flow models, information models, operational models, and interaction models needed for the design and development of ML applications. Conclusion: Consequently, this study can be viewed as an initial step towards the DE of ML and also a presentation of our future research directions.

Anahtar Kelimeler: Machine Learning, Domain Engineering, Requirement Engineering