Cristina
Tavares,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Several data analysis processes have been proposed by the academy and industry to describe the phases that data analysis experts go through when solving their problems. CRISP-DM and SEMMA are examples of model processes widely applied to data analysis projects. Specifically, CRISP-DM has modeling as one of its phases, which involves selecting a modeling technique, generating a test design, building a model, and assessing the model. However, the software automation of these data analysis modeling processes from a software engineering perspective faces numerous challenges, including the lack of software flexibility to accommodate complex usage and deployment variations and the lack of framework design that take variability into account to support the design process, which makes it difficult to evaluate the possibilities and opportunities for automating the data analysis modeling process.
This thesis proposes a variability-aware design approach to the data analysis modeling process, which involves (i) the assessment of the variabilities inherent in the CRISP-DM data analysis modeling phase and the provision of feature models that represent these variabilities; (ii) the definition of a framework design that captures the identified variabilities; and (iii) evaluation of the developed framework design in terms of the possibilities for process automation.