Master’s Thesis Presentation • Machine Learning • Using Domain Adaptation to Improve Water Quality Modeling with Sparse Data

Monday, December 16, 2024 2:00 pm - 3:00 pm EST (GMT -05:00)

Please note: This master’s thesis presentation will take place online.

Chi-Chung Cheung, Master candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Anita Layton

Water Quality (WQ) modelling is important not just to the conservation of ecosystems, but also to the welfare of modern human society. However, collecting enough high-quality data to use for training WQ prediction models is difficult. Unlike hydrology, current WQ collecting methods are constrained by cost, spatial coverage, and temporal sparsity.

This thesis explores using Domain Adaptation (DA) to overcome the data scarcity problem. By treating the different WQ measuring locations as different domains, high-resolution data from other locations can be used to better model a target location that has sparse data. The chosen DA method is inspired by domain-invariant (DI) representation learning. The model itself consists of (1) an f submodel representing the DI portion, and (2) one g submodel per domain representing the domain-variant portion.

Within the context of this thesis, the main findings are as follows:

  1. The optimal model sizes are different between pretraining and training.
  2. Using a station’s basin was not a good measure of similarity.
  3. At a high number of domains, further increasing the number of domains did not increase model performance.
  4. Simply adding the outputs of f and g (i.e. f (x) + g(x)) did not perform as well as passing the output of f through g (i.e. g(f (x))).

These findings support the effectiveness of using DA in WQ modelling as well as present various considerations that affect the final performance. Furthermore, these findings are relevant to not only this particular DA method but also to DA in general.


Attend this master’s thesis presentation on MS Teams