PhD Seminar • Artificial Intelligence • Less is More: Parameter-Free Text Classification with GzipExport this event to calendar

Wednesday, March 29, 2023 — 12:00 PM to 1:00 PM EDT

Please note: This PhD seminar will take place online.

Zhiying Jiang, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distributed (OOD) cases in practice.

In this paper, we propose a non-parametric alternative to DNNs that’s easy, light-weight, and universal in text classification: a combination of a simple compressor like gzip with a $k$-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distributed datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.


To attend this PhD seminar on Zoom, please go to https://uwaterloo.zoom.us/j/96169405889.

Location 
Online PhD seminar
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
5
6
  1. 2024 (98)
    1. April (21)
    2. March (27)
    3. February (25)
    4. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)