University of Waterloo   

Haotian Zhang

Data System Group

David R. Cheriton School of Computer Science

University of Waterloo

Email: h435zhan@uwaterloo.ca

I am a 4th-year PhD student in the Information Retrieval group and Data System group of the David R. Cheriton School of Computer Science at the University of Waterloo since 2015. My supervisors are Mark D. Smucker and Gordon V. Cormack. I am also working closely with Jimmy Lin, Maura Grossman, and Charlie Clarke.

My research interests include Information Retrieval, NLP, and Machine Learning (especially active learning and deep learning). My current research focus is on building a High-Recall-Information-Retrieval (HRIR) system to improve the efficiency and effectiveness of relevant document retrieval[4,5,6,11,12,14,15]. Another part of my research is investigating user behavior data from search logs[13]. I am also working on deep neural network and trying to apply it on solving Question Answering and Ad-Hoc search problems [3,8,9,10].

Before studying at University of Waterloo, I studied at Harbin Institute of Technology. I have studied and/or worked in Canada, USA, France, and Italy for more than three months.

Google Scholar Page of Haotian


  • Software Engineer Intern at Wish, Toronto. May.2018 - Aug.2018

    Product Boost - Ads prediction and ranking. Model design and evaluation on recommendation system. Spam products detection.
  • Research Intern at Oracle Lab, MA, USA. Jun.2016 - Aug.2016

    Learning to rank for eCommerce Search.
  • Software Engineer Intern at Adobe Systems (Beijing). Oct.2011 - July.2012

    TaaS System Implementation [Bachelor Thesis]

  • The code for High-Recal Information Retrieval system is now public: HiCAL. July, 2018.

  • Join Wish as a software engineer intern. Working on machine learning methods for solving eCommerce problems. Apirl, 2018.

  • Pass the PhD Comp2 examination [proposal], now a PhD candidate. April, 2018.

  • Present user requery behaviour work [13] at CHIIR 2018 [New Jersey]. Mar, 2018.

  • Talk about High-Recall-Information-Retrieval work with Google Cloud Team [Mountain View]. Dec, 2017.

  • Present our work at TREC CORE 2017 [11,12]. Our runs ranked the highest over all the 75 runs submitted by 15 different teams. Cheers! Nov, 2017


  • [15] "Effective User Interaction for High-Recall Retrieval: Less is More", CIKM 2018
    Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark Smucker, Gordon Cormack and Maura Grossman
    [pdf]

  • [14] "A System for Efficient High-Recall Retrieval", SIGIR 2018
    Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark Smucker, Gordon Cormack and Maura Grossman
    [pdf] [demo]

  • [13] "A Study of Immediate Requery Behavior in Search", CHIIR 2018
    Haotian Zhang, Mustafa Abualsaud and Mark Smucker
    [pdf] [slide] [bib]

  • [12] "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", Arxiv
    Haotian Zhang, Gordon Cormack, Maura Grossman and Mark Smucker
    [pdf]

  • [11] "UWaterlooMDS at the TREC 2017 Common Core Track", TREC 2017
    Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Angshuman Ghosh, Mark Smucker, Gordon Cormack and Maura Grossman
    [pdf]

  • [10] "Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams", SIGIR Neu-IR 2017
    Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, Salman Mohammed, and Jimmy Lin
    [pdf] [bib]

  • [9] "Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering", SIGIR Neu-IR 2017
    Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin.
    [pdf] [bib]

  • [8] "Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering", SIGIR 2017
    Haotian Zhang, Jinfeng Rao, Jimmy Lin, Mark Smucker
    [pdf] [bib]

  • [7] "Analyzing Opinion Dynamics in Online Social Networks", BigDIA 2016

    Robin Cohen, Alan Tsang, Krishna Vaidyanathan, and Haotian Zhang
    [pdf]

  • [6] "Optimizing Nugget Annotations with Active Learning", CIKM 2016, USA

    Gaurav Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy Lin, Mark D. Smucker and Olga Vechtomova

    [pdf] [bib]

  • [5] "Sampling Strategies and Active Learning for Volume Estimation", SIGIR 2016, Italy

    Haotian Zhang, Jimmy Lin, Gordon Cormack, Mark Smucker

    [pdf] [bib]

  • [4] "WaterlooClarke: TREC 2015 Total Recall Track", TREC 2015, US

    Haotian Zhang, Wu Lin, Yipeng Wang, Charles Clarke, Mark Smucker
    [pdf] [presentation] [poster] [bib]

  • [3] "Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings", ACL 2015, Beijing

    Haotian Zhang*, Luchen Tan*(Equal Contribution), Charles Clarke, Mark Smucker
    [pdf] [poster] [bib]

  • [2] "An Ontology-Based Data Exploration Tool for Key Performance Indicators", OTM 2014, Italy

    Claudia Diamantini, Domenico Potena, Emanuele Storti and Haotian Zhang
    [pdf] [bib]

  • [1] "Java Source Code Static Check Eclipse Plug-In Based on Common Design Pattern", WCSE 2013, Hongkong

    Haotian Zhang and Shu Liu
    [pdf]


  • A System for Efficient High-Recall Retrieval.

    Code for the HiCAL
  • Castorini: Deep Neural Network Frameworks for Question Answering

    Code for the Castorini
  • Continous Active Learning for TREC Total Recall (BMI) 2015

    Code for the local version of BMI implementation


© Haotian Zhang@UWaterloo 2018