Please note: This PhD seminar will take place in DC 2310.
Amandeep Kaur, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor James R. Wallace
Social media constitutes a rich and influential source of information for qualitative researchers. However, its vast volume and diversity present significant challenges, which can be assisted by computational techniques like topic modeling. But qualitative researchers often struggle to use computational techniques due to a lack of programming expertise and concerns about maintaining the nuanced aspects of their research, such as contextual understanding, subjective interpretations, and ethical considerations of their data.
To address this issue, this thesis explores the integration of BERTopic, an advanced Large Language Model (LLM)-based method, into the Computational Thematic Analysis (CTA) Toolkit to support qualitative data analysis of social media. We conducted interviews and hands-on evaluations in which qualitative researchers compared topics from three modeling techniques — LDA, NMF and BERTopic. Participants prioritized topic relevance, logical organization, and the capacity to reveal unexpected relationships within the data, valuing detailed, coherent clusters for deeper understanding and actionable insights. BERTopic was favored by 8/12 participants for its ability to uncover hidden connections. These findings underscore the transformative potential of LLM-based tools in providing deeper, more nuanced insights for qualitative analysis of social media data.