Xu Chu successfully defended his PhD in August 2017. He was supervised by Professor Ihab Ilyas, a member of the David R. Cheriton School of Computer Science’s Data Systems Group.
Xu came to the University of Waterloo in 2010 as a fourth-year undergraduate exchange student from China’s Nanjing University. After completing his undergraduate degree in computer science at Nanjing University, he enrolled in the graduate program as a master’s student in 2011.
Xu showed much promise, so Professor Ilyas recommended he switch from the master’s to the PhD program. In 2015, Xu was invited to join the prestigious Microsoft Research PhD Fellowship Program, a two-year industry fellowship for outstanding PhD candidates.
Although he just graduated with his doctorate, Xu was recently offered a tenure-track position in the College of Computing at Georgia Institute of Technology in Atlanta. We sat down with Xu to learn more about his experience and his trajectory from undergraduate exchange student to PhD candidate to tenure-track faculty.
How did you learn about the University of Waterloo?
Waterloo has a formal exchange agreement through the UW 3+1 Program with several top universities in China. Undergraduate students at Nanjing University, my home university, can come to Waterloo to study for a year and students here can go to Nanjing.
Computer science at Waterloo is very strong, so Nanjing’s students are attracted to study here. Similarly, Nanjing has a number of strong programs, so people at Waterloo are attracted to those. It’s a beneficial exchange for students at both universities.
When I learned about this exchange program I decided to come to Waterloo to take advantage of this fantastic opportunity.
Was this your first trip to Canada?
It was my first trip abroad! And it was my first time travelling by plane — a 13-hour flight, almost the longest single flight you can take.
What were your early impressions of life in Canada?
Some senior students helped me and other exchange students rent a house in Waterloo, so we had a place to stay when we arrived in Canada. We took a taxi from the airport to the rental house in Waterloo, but we had nothing to eat when we arrived. I checked a map and the closest grocery store was a 20-minute walk away. This is totally different from life in China, where everything is available everywhere.
I had some training in English, but it was taught by Chinese teachers who didn’t speak much conversational English. So, improving English skills was perhaps the most important task I had to master. I improved my English by taking classes, interacting with fellow students, doing academic presentations, and watching English TV shows. My all-time favourite show was Friends. [laughs] I guess you could say that had six English instructors.
What did you do during your undergraduate exchange year at Waterloo?
I took computer science courses just like domestic undergraduates, completed assignments and wrote exams. I was already familiar with the subject matter, but learning it in another language certainly helped improve my English substantially.
I didn’t do a lot of research until I started my master’s degree in 2011, but I did take advantage of a URA — the Undergrad Research Assistantship program. Ihab took me on as research assistant and I worked about five hours a week with one of his PhD students. It was my first introduction to research.
Tell us a bit about your graduate experience with Professor Ihab Ilyas
Ihab is a rigorous supervisor, but he’s fair and systematic and guides students so they meet milestones. During my early years as a PhD student he had me concentrate on meeting research goals, but during my senior years he wanted me to focus on collaborating with researchers here, at other universities and in industry. He’s guided me strategically all along the way and it paid off.
What’s the overarching goal of your PhD research?
My research aims to make dirty data clean. We call ourselves data janitors. [laughs]
Data dirtiness can come from anywhere. For example, when you enter someone into a personnel record you might make a typo, you might enter the same person twice, or use the person’s initials instead of the full name. It’s one person, but the records are all different. And the problem becomes more complex when you integrate data sets — for example, personnel records at the School of Computer Science with those of the university. You want a unified database, but the integration process itself can introduce errors.
The aim of my research was to detect data dirtiness and deal with it by updating the data automatically to the correct data. Of course, you may not know what the correct data is, so that’s the challenge. This process is still done manually, but it’s slow and difficult to scale up. My research aimed to have a computer do it automatically.
Tell us about your career-searching experience
I started job hunting in November 2016 and I applied to three categories of employers — universities, industry labs and industry. I was most interested in university research, so I focused on faculty positions, but I applied to all three employer categories.
I got interviews at a dozen or so universities, including Cornell, Georgia Tech and the University of British Columbia. On the industry lab side, I got an offer from Microsoft Research. From industry I had an applied scientist offer from Amazon.
My original thought process was that if I got an offer from a top university, I’ll have strong students and be in a stimulating environment where I can pursue my research interests. Georgia Tech is a fantastic school, so I eventually accepted their offer.
What research will you be pursuing at Georgia Tech?
I’m still interested in data cleaning and will continue to collaborate with Ihab’s group, but I’m going to explore other research directions, too. I’m not sure at this point, but I want to figure that out before I start at Georgia Tech in January.
Do you have any advice to current and prospective graduate students?
Computer science at Waterloo is fantastic and is every bit the equal of top computer science schools around the world.
As graduate students, we have all the resources we need to succeed. The school has close to 100 faculty members, which is larger than any computer science school in Canada and almost all in the United States. In the Data Systems Group alone we have more than a dozen faculty members. And that’s just one group. We have research breadth and depth that few other schools can offer.
The Cheriton School has fantastic collaborative opportunities, so you can do research in one area but tap into the expertise and resources of another. We’re encouraged to attend conferences and to build networks.
If I could offer just one bit of advice I’d say try to land an internship during your graduate degree. In 2015, I become a Microsoft Research PhD Fellow. Each year Microsoft picks 12 PhD students across North America for this two-year fellowship program. That year, two of the 12 students were from the Cheriton School of Computer Science, me and Laura Inozemtseva. That two of the fellows were from Waterloo that year speaks volumes about our standing and shows we compete favourably with the best computer science schools.
This internship was great for a variety of reasons. I got a strong recommendation letter from Microsoft’s principal researcher. Recommendations letters are extremely important. Publications obviously matter a lot as well, but so does what people say about you. It was a great opportunity and I’m glad I took advantage of it.