Master’s Thesis Presentation • Data Systems — Discovering Domain Orders through Order DependenciesExport this event to calendar

Monday, April 12, 2021 11:00 AM EDT

Please note: This master’s thesis presentation will be given online.

MohammadReza Karegar, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Lukasz Golab

Most real-world data come with explicitly defined domain orders; e.g., lexicographic order for strings, numeric for integers, and chronological for time. Our goal is to discover implicit domain orders that we do not already know; for instance, that the order of months in the Chinese Lunar calendar is Corner < Apricot < Peach. To do so, we enhance data profiling methods by discovering implicit domain orders in data through order dependencies. We enumerate tractable special cases and proceed towards the most general case, which we prove is NP-complete. We then consider discovering approximate implicit orders; i.e., those that exist with some exceptions. We propose definitions of approximate implicit orders and prove that all non-trivial cases are NP-complete. We show that the NP-complete cases nevertheless can be effectively handled by a SAT solver. We also devise an interestingness measure to rank the discovered implicit domain orders.

Based on an extensive suite of experiments with real-world data, we establish the efficacy of our algorithms, and the utility of the domain orders discovered by demonstrating significant added value in two applications (data profiling and data mining).


To join this master’s thesis presentation on Zoom, please go to https://zoom.us/j/97621624694?pwd=a2MvZnlsLzFxVDFrMXJ1U2kza0pEQT09.

Location 
Online presentation
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  1. 2024 (80)
    1. April (8)
    2. March (22)
    3. February (25)
    4. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)