Data Cleaning Surveys
-
Erhard Rahm, Hong Hai Do
Data Cleaning: Problems and Current Approaches., IEEE Data Eng. Bull. 23(4): 3-13 (2000)
-
Tamraparni Dasu, Theodore Johnson
Exploratory Data Mining and Data Cleaning, John Wiley 2003, ISBN 0-471-26851-8
-
Ihab F. Ilyas, Xu Chu
Trends in Cleaning Relational Data, Under Review
Error Detection
1.1 Constraints Language
-
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis,
Conditional Functional Dependencies for Data Cleaning,
ICDE 2007
-
L. Bravo, W. Fan, and S. Ma,
Extending dependencies with conditions,
VLDB 2007
-
Wenfei Fan, Shuai Ma, Yanli Hu, Jie Liu, Yinghui Wu,
Propagating Functional Dependencies with Conditions,
VLDB 2008
-
Jiannan Wang, Nan Tang
Towards dependable data repairing with fixing rules,
SIGMOD 2014
1.2 Constraints Discovery
-
Yka Huhtala, Juha Karkkainen, Pasi Porkka, Hannu Toivonen,
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies,
The Computer Journal, Vol. 42, No. 2, 1999
-
I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga,
CORDS: Automatic discovery of correlations and soft functional dependencies,
SIGMOD 2004
-
L. Golab, H. J. Karloff, F. Korn, D. Srivastava, and B. Yu,
On generating near-optimal tableaux for conditional functional dependencies,
VLDB 2008
-
F. Chiang and R. J. Miller,
Discovering data quality rules,
VLDB 2008
-
W. Fan, F. Geerts, L. V. Lakshmanan, and M. Xiong,
Discovering conditional functional dependencies,
ICDE 2009
-
Xu Chu, Ihab F. Ilyas, Paolo Papotti,
Discovering Denial Constraints,
VLDB 2014
-
Arvid Heise, Jorge-Arnulfo, Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann
Scalable Discovery of Unique Column Combinations, VLDB 2014
1.3 Causality and Error Propagation
-
Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu,
Tracing data errors with view-conditioned causality SIGMOD 2011
-
Eugene Wu, Samuel Madden, Scorpion: Explaining Away Outliers in Aggregate Queries, VLDB 2013
-
Anup Chalamalla, Ihab F. Ilyas, Mourad Ouzzani, and Paolo Papotti, Descriptive and Prescriptive Data Cleaning, SIGMOD 2014
Data Repairing
2.1 Record Linkage
- Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios, Duplicate Record Detection: A Survey,
IEEE Trans. Knowl. Data Eng.(2007)
- W. Fan, J. Li, S. Ma, N. Tang, and W. Yu, Interaction between record matching and data repairing,
SIGMOD 2011
2.2 Automatic Data Repairing of Constraints Violations
-
P. Bohannon, W. Fan, M. Flaster, and R. Rastogi,
A cost-based model and effective heuristic for repairing constraints by value modification,
SIGMOD 2005
-
A. Lopatenko and L. Bravo,
Efficient approximation algorithms for repairing inconsistent databases,
ICDE 2007
-
G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma,
Improving data quality: Consistency and accuracy,
VLDB 2007
-
Solmaz Kolahi, Laks V. S. Lakshmanan,
On Approximating Optimum Repairs for Functional Dependency Violations,
ICDT 2009,
-
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu,
Towards certain fixes with editing rules and master data,
VLDB 2012
-
Xu Chu, Ihab F. Ilyas, Paolo Papotti,
Holistic data cleaning: Putting violations into context,
ICDE 2013
-
Mohamed Yakout, Laure Berti-Equille, Ahmed K. Elmagarmid,
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes,
SIGMOD 2013
2.3 Probabilistic and Model-based Data Repairing
-
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David,
Modeling and Querying Possible Repairs in Duplicate Detection,
VLDB 2009
-
George Beskales, Ihab F. Ilyas, Lukasz Golab,
Sampling the Repairs of Functional Dependency Violations under Hard Constraints,
VLDB 2010
2.4 Repairing Constraints and Data
-
Fei Chiang, Renee J. Miller,
A Unified Model for Data and Constraint Repair,
ICDE 2011
-
George Beskales, Ihab F. Ilyas, Lukasz Golab, Artur Galiullin,
On the relative trust between inconsistent data and inaccurate constraints,
ICDE 2013
-
Maksims Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller
Continuous data cleaning,
ICDE 2014
2.5 User Involved Data Repairing
-
Sunita Sarawagi,Anuradha Bhamidipaty.
Interactive deduplication using active learning,
KDD 2002
-
Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas,
Guided data repair,
VLDB 2011
-
Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng,
CrowdER: Crowdsourcing Entity Resolution,
VLDB 2012
2.6 Data Cleaning Systems
-
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon,
AJAX: An Extensible Data Cleaning Tool,
SIGMOD 2000
-
Vijayshankar Raman, Joseph M. Hellerstein,
Potter's Wheel: An Interactive Data Cleaning System,
VLDB 2001
-
Amr Ebaid, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin,
NADEEF: A Generalized Data Cleaning System, VLDB 2013
-
Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro,
The LLUNATIC Data-Cleaning Framework,
VLDB 2013
-
Michael Stonebraker, Daniel Bruckner, Ihab F. Ilyas, George Beskales, Mitch Cherniack, Stanley B. Zdonik, Alexander Pagan, Shan Xu,
Data Curation at Scale: The Data Tamer System,
CIDR 2013