Data Cleaning Surveys

  1. Erhard Rahm, Hong Hai Do Data Cleaning: Problems and Current Approaches., IEEE Data Eng. Bull. 23(4): 3-13 (2000)
  2. Tamraparni Dasu, Theodore Johnson Exploratory Data Mining and Data Cleaning, John Wiley 2003, ISBN 0-471-26851-8
  3. Ihab F. Ilyas, Xu Chu Trends in Cleaning Relational Data, Under Review

Error Detection

1.1 Constraints Language

  1. Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis, Conditional Functional Dependencies for Data Cleaning, ICDE 2007
  2. L. Bravo, W. Fan, and S. Ma, Extending dependencies with conditions, VLDB 2007
  3. Wenfei Fan, Shuai Ma, Yanli Hu, Jie Liu, Yinghui Wu, Propagating Functional Dependencies with Conditions, VLDB 2008
  4. Jiannan Wang, Nan Tang Towards dependable data repairing with fixing rules, SIGMOD 2014

1.2 Constraints Discovery

  1. Yka Huhtala, Juha Karkkainen, Pasi Porkka, Hannu Toivonen, TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies, The Computer Journal, Vol. 42, No. 2, 1999
  2. I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga, CORDS: Automatic discovery of correlations and soft functional dependencies, SIGMOD 2004
  3. L. Golab, H. J. Karloff, F. Korn, D. Srivastava, and B. Yu, On generating near-optimal tableaux for conditional functional dependencies, VLDB 2008
  4. F. Chiang and R. J. Miller, Discovering data quality rules, VLDB 2008
  5. W. Fan, F. Geerts, L. V. Lakshmanan, and M. Xiong, Discovering conditional functional dependencies, ICDE 2009
  6. Xu Chu, Ihab F. Ilyas, Paolo Papotti, Discovering Denial Constraints, VLDB 2014
  7. Arvid Heise, Jorge-Arnulfo, Quiane-Ruiz, Ziawasch Abedjan, Anja Jentzsch, Felix Naumann Scalable Discovery of Unique Column Combinations, VLDB 2014

1.3 Causality and Error Propagation

  1. Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu, Tracing data errors with view-conditioned causality SIGMOD 2011
  2. Eugene Wu, Samuel Madden, Scorpion: Explaining Away Outliers in Aggregate Queries, VLDB 2013
  3. Anup Chalamalla, Ihab F. Ilyas, Mourad Ouzzani, and Paolo Papotti, Descriptive and Prescriptive Data Cleaning, SIGMOD 2014

Data Repairing

2.1 Record Linkage

  1. Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios, Duplicate Record Detection: A Survey, IEEE Trans. Knowl. Data Eng.(2007)
  2. W. Fan, J. Li, S. Ma, N. Tang, and W. Yu, Interaction between record matching and data repairing, SIGMOD 2011

2.2 Automatic Data Repairing of Constraints Violations

  1. P. Bohannon, W. Fan, M. Flaster, and R. Rastogi, A cost-based model and effective heuristic for repairing constraints by value modification, SIGMOD 2005
  2. A. Lopatenko and L. Bravo, Efficient approximation algorithms for repairing inconsistent databases, ICDE 2007
  3. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma, Improving data quality: Consistency and accuracy, VLDB 2007
  4. Solmaz Kolahi, Laks V. S. Lakshmanan, On Approximating Optimum Repairs for Functional Dependency Violations, ICDT 2009,
  5. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu, Towards certain fixes with editing rules and master data, VLDB 2012
  6. Xu Chu, Ihab F. Ilyas, Paolo Papotti, Holistic data cleaning: Putting violations into context, ICDE 2013
  7. Mohamed Yakout, Laure Berti-Equille, Ahmed K. Elmagarmid, Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes, SIGMOD 2013

2.3 Probabilistic and Model-based Data Repairing

  1. George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, Modeling and Querying Possible Repairs in Duplicate Detection, VLDB 2009
  2. George Beskales, Ihab F. Ilyas, Lukasz Golab, Sampling the Repairs of Functional Dependency Violations under Hard Constraints, VLDB 2010

2.4 Repairing Constraints and Data

  1. Fei Chiang, Renee J. Miller, A Unified Model for Data and Constraint Repair, ICDE 2011
  2. George Beskales, Ihab F. Ilyas, Lukasz Golab, Artur Galiullin, On the relative trust between inconsistent data and inaccurate constraints, ICDE 2013
  3. Maksims Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller Continuous data cleaning, ICDE 2014

2.5 User Involved Data Repairing

  1. Sunita Sarawagi,Anuradha Bhamidipaty. Interactive deduplication using active learning, KDD 2002
  2. Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas, Guided data repair, VLDB 2011
  3. Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng, CrowdER: Crowdsourcing Entity Resolution, VLDB 2012

2.6 Data Cleaning Systems

  1. Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, AJAX: An Extensible Data Cleaning Tool, SIGMOD 2000
  2. Vijayshankar Raman, Joseph M. Hellerstein, Potter's Wheel: An Interactive Data Cleaning System, VLDB 2001
  3. Amr Ebaid, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin, NADEEF: A Generalized Data Cleaning System, VLDB 2013
  4. Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro, The LLUNATIC Data-Cleaning Framework, VLDB 2013
  5. Michael Stonebraker, Daniel Bruckner, Ihab F. Ilyas, George Beskales, Mitch Cherniack, Stanley B. Zdonik, Alexander Pagan, Shan Xu, Data Curation at Scale: The Data Tamer System, CIDR 2013