University of Waterloo   

Haotian Zhang

Data System Group

David R. Cheriton School of Computer Science

University of Waterloo


I got PhD from the Information Retrieval group and Data System group of the David R. Cheriton School of Computer Science at the University of Waterloo. My supervisors are Mark D. Smucker and Gordon V. Cormack. I am also working closely with Jimmy Lin, Maura Grossman, and Charlie Clarke.

My broad research interests include Information Retrieval, NLP, and Machine Learning (especially active learning and deep learning). My core research is building a High-Recall-Information-Retrieval (HiCAL) system to help users find all or nearly all relevant information more efficiently and effectively[4,5,6,11,12,14,15,16,18,19]. Another part of my research is understanding user behavior to improve the quality of ranking[13]. I am also working on deep neural network and trying to apply it on solving Question Answering and Ad-Hoc search problems [3,8,9,10,17].

Before studying at University of Waterloo, I studied at Harbin Institute of Technology, China. I have studied and/or worked in Canada, USA, France, and Italy.

Google Scholar Page of Haotian

  • Machine Learning Engineer at Wish, Toronto. May.2019 - Present

  • Software Engineer Intern at Wish, Toronto. May.2018 - Aug.2018

    Product Boost - Ads prediction and ranking. Model design and evaluation on recommendation system. Spam products detection.
  • Research Intern at Oracle Lab, Boston, USA. Jun.2016 - Aug.2016

    Learning to rank for eCommerce Search.
  • Software Engineer Intern at Adobe Systems (Beijing). Oct.2011 - July.2012

  • Successfully pass PhD defence. Thanks to my advisors and PhD committee members. April 10, 2019

  • Participated in TREC CORE 2018 [16], our run UWaterMDS_Rank ranks the highest over all the 72 runs! DC, Nov, 2018

  • Presented our high recall work [15] at CIKM 2018, Italy. Oct, 2018.

  • The code for High-Recall Information Retrieval system (HiCAL) is now public: HiCAL. July, 2018.

  • Join Wish as a software engineer intern. Working on machine learning models for solving eCommerce problems.Toronto, Apirl, 2018.

  • Pass the PhD Comp2 examination [proposal], now a PhD candidate. April, 2018.

  • Present user requery behaviour work [13] at CHIIR 2018, New Jersey. Mar, 2018.

  • Present High-Recall-Information-Retrieval work with Google Cloud Team [Mountain View]. Dec, 2017.

  • Present our work at TREC CORE 2017 [11,12]. Our runs UWaterMDS_* rank the highest over all the 75 runs submitted by 15 different teams. Cheers! Nov, 2017

  • [19] "Increasing the Efficiency of High-Recall Information Retrieval", PhD Thesis

  • [18] "Dynamic Sampling Meets Pooling", SIGIR 2019

  • [17] "Simple Applications of BERT for Ad Hoc Document Retrieval", 2019

  • [16] "UWaterlooMDS at the TREC 2018 Common Core Track", TREC 2018
    [pdf] [Slides]

  • [15] "Effective User Interaction for High-Recall Retrieval: Less is More", CIKM 2018
    Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark Smucker, Gordon Cormack and Maura Grossman

  • [14] "A System for Efficient High-Recall Retrieval", SIGIR 2018
    Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark Smucker, Gordon Cormack and Maura Grossman
    [pdf] [demo]

  • [13] "A Study of Immediate Requery Behavior in Search", CHIIR 2018
    Haotian Zhang, Mustafa Abualsaud and Mark Smucker
    [pdf] [slide] [bib]

  • [12] "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", IRJ
    Haotian Zhang, Gordon Cormack, Maura Grossman and Mark Smucker

  • [11] "UWaterlooMDS at the TREC 2017 Common Core Track", TREC 2017
    Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Angshuman Ghosh, Mark Smucker, Gordon Cormack and Maura Grossman

  • [10] "Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams", SIGIR Neu-IR 2017
    Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, Salman Mohammed, and Jimmy Lin
    [pdf] [bib]

  • [9] "Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering", SIGIR Neu-IR 2017
    Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin.
    [pdf] [bib]

  • [8] "Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering", SIGIR 2017
    Haotian Zhang, Jinfeng Rao, Jimmy Lin, Mark Smucker
    [pdf] [bib]

  • [7] "Analyzing Opinion Dynamics in Online Social Networks", BigDIA 2016

    Robin Cohen, Alan Tsang, Krishna Vaidyanathan, and Haotian Zhang

  • [6] "Optimizing Nugget Annotations with Active Learning", CIKM 2016, USA

    Gaurav Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy Lin, Mark D. Smucker and Olga Vechtomova

    [pdf] [bib]

  • [5] "Sampling Strategies and Active Learning for Volume Estimation", SIGIR 2016, Italy

    Haotian Zhang, Jimmy Lin, Gordon Cormack, Mark Smucker

    [pdf] [bib]

  • [4] "WaterlooClarke: TREC 2015 Total Recall Track", TREC 2015, US

    Haotian Zhang, Wu Lin, Yipeng Wang, Charles Clarke, Mark Smucker
    [pdf] [presentation] [poster] [bib]

  • [3] "Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings", ACL 2015, Beijing

    Haotian Zhang*, Luchen Tan*(Equal Contribution), Charles Clarke, Mark Smucker
    [pdf] [poster] [bib]

  • [2] "An Ontology-Based Data Exploration Tool for Key Performance Indicators", OTM 2014, Italy

    Claudia Diamantini, Domenico Potena, Emanuele Storti and Haotian Zhang
    [pdf] [bib]

  • [1] "Java Source Code Static Check Eclipse Plug-In Based on Common Design Pattern", WCSE 2013, Hongkong

    Haotian Zhang and Shu Liu

  • A System for Efficient High-Recall Retrieval.

    Code for the HiCAL
  • Castorini: Deep Neural Network Frameworks for Question Answering

    Code for the Castorini
  • Continous Active Learning for TREC Total Recall (BMI) 2015

    Code for the local version of BMI implementation

© Haotian Zhang@UWaterloo 2018