CS480/680 Winter 2023 - Introduction to Machine Learning

There is no midterm and no final exam. Instead, undergraduate students (CS480) must participate in a Kaggle competition (25% of final grade). A training dataset will be made available as part of the competition. Your task is to train machine learning algorithms to achieve the highest possible accuracy on the test set. You are free to use any machine learning algorithms from the course. Your grade will be based on the accuracy that you will achieve in the competition (15% of final grade) and a report that describes the algorithm submitted (10% of final grade). The deadline for submitting the report and making submissions to the competition is April 17, 2023 at 11:59 pm (Waterloo time).

The Kaggle competition for the course is now open. To enroll in the competition, click on the private link posted on LEARN (click on "Content" and then "Kaggle Competition"). Follow the instructions on the competition webpage to build and submit your solution.

Submission Format

Submission files should contain two columns: 'id' and 'category'. For an example, view the sample submission under the Data tab. The file should contain a header and have the following format:

id,category 12543,Topwear

When you submit, please name your team "student#_lastname". For example: "1234567_poupart"

In addition to submitting your predictions on Kaggle, you must submit the following material via LEARN (deadline: April 17 @ 11:59 pm Waterloo time).

Code: When you submit your code, include instructions about how to run it. If your code consists of a Jupyter Notebook, indicate in which environment (Kaggle, Colab or elsewhere) it is expected to run and in which directory it expects to find the data. Include those instructions in a text cell at the beginning of the notebook. If your code does not consist of a Jupyter Notebook, provide instructions to run it in a linux environment (including a description of the dependencies).
Report: The report can be in the form of text cells with the code in a Jupyter Notebook or a separate file. The report consists of brief descriptions of what your code does to leverage:
- Categorical attributes: one or two paragraphs that describe the algorithm/technique that uses the categorical attributes to make predictions
- Noisy text description: one or two paragraphs that describe the algorithm/technique that uses the noisy text description to make predictions
- Images: one or two paragraphs that describe the algorithm/technique that uses the images to make predictions
- Ensemble learning: one or two paragraphs that describe the ensemble algorithm used to combine/boost hypotheses when making predictions

Marks

Your grade will be based on your score on the private leaderboard as well as the code and report that you submit in LEARN. Marks will be assigned as follows:

Squared accuracy (15 points): You will earn points according the following formula: 15 * accuracy * accuracy. For example if you achieve an accuracy of 0.9 on the private leaderboard, then you will earn 15 * 0.9 * 0.9 = 12.15 points.
Code and report (10 points):

Categorical attributes (2.5 points): meaningful use of categorical attributes in the predictions
Noisy Text Description (2.5 points): meaningful use of the text data in the noisyTextDescription field when making predictions
Images (2.5 points): meaningful use of the images in the predictions
Ensemble (2.5 points): meaningful use of an ensemble technique to combine/boost hypotheses in the predictions