Master’s Thesis Presentation • Software Engineering • Variability-aware Neo4j for Analyzing a Graphical Model of a Software Product Line

Wednesday, August 9, 2023 9:30 am - 10:30 am EDT (GMT -04:00)

Please note: This master’s thesis presentation will take place in DC 2564.

Xiang Chen, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jo Atlee

A Software product line (SPLs) eases the development of families of related products by managing and integrating a collection of mandatory and optional features (units of functionality). Individual products can be derived from the product line by selecting among the optional features. Companies that successfully employ SPLs report dramatic improvements in rapid product development, software quality, labour needs, support for mass customization, and time to market.

In a product line of reasonable size, it is impractical to verify every product because the number of possible feature combinations is exponential in the number of features. As a result, developers might verify a small fraction of products and limit the choices offered to consumers, thereby foregoing one of the greatest promises of product lines — mass customization.

To improve the efficiency of analyzing SPLs, (1) we analyze a model of an SPL rather than its code and (2) we analyze the SPL model itself rather than models of its products. We extract a model comprising facts (e.g., functions, variables, assignments) from an SPL’s source-code artifacts. The facts from different software components are linked together into a lightweight model of the code, called a factbase. The resulting factbase is a typed graphical model that can be analyzed using the Neo4j graph database.

In this thesis, we lift the Neo4j query engine to reason over a factbase of an entire SPL. By lifting the Neo4j query engine, we enable any analysis that can be expressed in the query language to be applicable to an SPL model. The lifted analyses return variability-aware results, in which each result is annotated with a feature expression denoting the products to which the result applies.

We evaluated lifted Neo4j on five real-world open-source SPLs, with respect to ten commonly used analyses of interest. The first evaluation aims at comparing the performance of a post-processing approach versus an on-the-fly approach computing the feature expressions that annotate to variability-aware results of lifted Neo4j. In general, the on-the-fly approach has a smaller runtime compared to the post-processing approach. The second evaluation aims at assessing the overhead of analyzing a model of an SPL versus a model of a single product, which ranges from 0.78% to 404%. In the third evaluation, we compare the outputs and performance of lifted Neo4j to a related work that employs the variability-aware V-Soufflé Datalog engine. We found that lifted Neo4j is usually more efficient than V-Soufflé when returning the same results (i.e., the end points of path results). When lifted Neo4j returns complete path results, it is generally slower than V-Soufflé, although lifted Neo4j can outperform V-Soufflé on analyses that return short fixed-length paths.