Masoumeh
Shafieinejad,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Outlier detection plays a significant role in various real world applications such as intrusion, malfunction, and fraud detection. Traditionally, outlier detection techniques are applied to find outliers in the context of the whole dataset. However, this practice neglects the data points, namely contextual outliers, that are not outliers in the whole dataset but in some specific neighborhoods. Contextual outliers are particularly important in data exploration and targeted anomaly explanation and diagnosis.
In these scenarios, the data owner releases the following information: i) The attributes that contribute to the abnormality of an outlier (metric), ii) Contextual description of the outlier’s neighborhoods (context), and iii) The utility, such as the outlier’s significance defined by the number of records (population size) covered by the outlier’s context. However, revealing the outlier’s information leaks information about the other individuals in the population as well, violating their privacy.
We address the issue of population privacy violations and propose utilizing differential privacy techniques to protect the privacy of individuals. The application of differential privacy, however, is challenging as direct utilization of differential privacy techniques imposes intensive computation to the contextual outlier release algorithm. To overcome this challenge, we propose a graph structure to map the contexts to, and introduce differentially private graph search algorithms as efficient solutions for the computation problem caused by differential privacy techniques.