Data Cleaning Surveys
Erhard Rahm, Hong Hai Do
Data Cleaning: Problems and Current Approaches., IEEE Data Eng. Bull. 23(4): 3-13 (2000)
Tamraparni Dasu, Theodore Johnson
Exploratory Data Mining and Data Cleaning, John Wiley 2003, ISBN 0-471-26851-8
Ihab F. Ilyas, Xu Chu
Trends in Cleaning Relational Data, Under Review
Error Detection
1.1 Constraints Language
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis,
Conditional Functional Dependencies for Data Cleaning,
ICDE 2007
L. Bravo, W. Fan, and S. Ma,
Extending dependencies with conditions,
VLDB 2007
Wenfei Fan, Shuai Ma, Yanli Hu, Jie Liu, Yinghui Wu,
Propagating Functional Dependencies with Conditions,
VLDB 2008
Jiannan Wang, Nan Tang
Towards dependable data repairing with fixing rules,
1.2 Constraints Discovery
Yka Huhtala, Juha Karkkainen, Pasi Porkka, Hannu Toivonen,
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies,
The Computer Journal, Vol. 42, No. 2, 1999
I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga,
CORDS: Automatic discovery of correlations and soft functional dependencies,
L. Golab, H. J. Karloff, F. Korn, D. Srivastava, and B. Yu,
On generating near-optimal tableaux for conditional functional dependencies,
VLDB 2008
F. Chiang and R. J. Miller,
Discovering data quality rules,
VLDB 2008
W. Fan, F. Geerts, L. V. Lakshmanan, and M. Xiong,
Discovering conditional functional dependencies,
ICDE 2009
Xu Chu, Ihab F. Ilyas, Paolo Papotti,
Discovering Denial Constraints,
VLDB 2014
Arvid Heise, Jorge-Arnulfo, Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann
Scalable Discovery of Unique Column Combinations, VLDB 2014
1.3 Causality and Error Propagation
Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu,
Tracing data errors with view-conditioned causality SIGMOD 2011
Eugene Wu, Samuel Madden, Scorpion: Explaining Away Outliers in Aggregate Queries, VLDB 2013
Anup Chalamalla, Ihab F. Ilyas, Mourad Ouzzani, and Paolo Papotti, Descriptive and Prescriptive Data Cleaning, SIGMOD 2014
Data Repairing
2.1 Record Linkage
- Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios, Duplicate Record Detection: A Survey,
IEEE Trans. Knowl. Data Eng.(2007)
- W. Fan, J. Li, S. Ma, N. Tang, and W. Yu, Interaction between record matching and data repairing,
2.2 Automatic Data Repairing of Constraints Violations
P. Bohannon, W. Fan, M. Flaster, and R. Rastogi,
A cost-based model and effective heuristic for repairing constraints by value modification,
A. Lopatenko and L. Bravo,
Efficient approximation algorithms for repairing inconsistent databases,
ICDE 2007
G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma,
Improving data quality: Consistency and accuracy,
VLDB 2007
Solmaz Kolahi, Laks V. S. Lakshmanan,
On Approximating Optimum Repairs for Functional Dependency Violations,
ICDT 2009,
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu,
Towards certain fixes with editing rules and master data,
VLDB 2012
Xu Chu, Ihab F. Ilyas, Paolo Papotti,
Holistic data cleaning: Putting violations into context,
ICDE 2013
Mohamed Yakout, Laure Berti-Equille, Ahmed K. Elmagarmid,
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes,
2.3 Probabilistic and Model-based Data Repairing
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David,
Modeling and Querying Possible Repairs in Duplicate Detection,
VLDB 2009
George Beskales, Ihab F. Ilyas, Lukasz Golab,
Sampling the Repairs of Functional Dependency Violations under Hard Constraints,
VLDB 2010
2.4 Repairing Constraints and Data
Fei Chiang, Renee J. Miller,
A Unified Model for Data and Constraint Repair,
ICDE 2011
George Beskales, Ihab F. Ilyas, Lukasz Golab, Artur Galiullin,
On the relative trust between inconsistent data and inaccurate constraints,
ICDE 2013
Maksims Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller
Continuous data cleaning,
ICDE 2014
2.5 User Involved Data Repairing
Sunita Sarawagi,Anuradha Bhamidipaty.
Interactive deduplication using active learning,
KDD 2002
Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas,
Guided data repair,
VLDB 2011
Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng,
CrowdER: Crowdsourcing Entity Resolution,
VLDB 2012
2.6 Data Cleaning Systems
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon,
AJAX: An Extensible Data Cleaning Tool,
Vijayshankar Raman, Joseph M. Hellerstein,
Potter's Wheel: An Interactive Data Cleaning System,
VLDB 2001
Amr Ebaid, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin,
NADEEF: A Generalized Data Cleaning System, VLDB 2013
Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro,
The LLUNATIC Data-Cleaning Framework,
VLDB 2013
Michael Stonebraker, Daniel Bruckner, Ihab F. Ilyas, George Beskales, Mitch Cherniack, Stanley B. Zdonik, Alexander Pagan, Shan Xu,
Data Curation at Scale: The Data Tamer System,
CIDR 2013