The class project aims to extend or practice your knowledge/skills learned from the class. The project is worth 40% of the class grade. Details are specified as follows.


You can use probabilistic learning approaches to explore/address some research task or do any practical applications per your interest. You can form a group for the project. Each project group consist of at most two students.

Grading and Milestones

Please create a Github repostiory to update and matain your project. The grading is broken into the following milestones:

  1. Project team (5 points): Notify the instructor/TA the members of your project group.

  2. Mid-term report  (30 points): Submit 3-5 page description about your project (size 11 font). The description should include the following information:
    • A brief introduction to the problem you want to solve using probabilistic learning techniques.(10%)
    • The motivation - why do you want to use learning techniques? Why not the traditional or existing methods? (10%)
    • What you have done to reach your goal. Note that just “We collected data” will NOT be enough (40%)
    • What is your detailed plan for the rest of the project (30%)
    • Reference to literature (10%)
  3. Final report (65 points): The length of the final report is 3-6 pages (size less than or equal to 11 font), which should be structured as a small research paper. It should consist of the following content:
    1. Problem definition and motivation - what problem did you choose? Why is it important or interesting? Why did you use machine learning techniques to solve it? (20 points)
    2. Your solution - the details of the machine learning models/algorithms you chose/developed (or proofs for theoretical projects) (20 points).
    3. Experimental evaluation (20 points)
    4. Future plan (5 points)

    For theoretical project, the solution and experimental evaluation will be graded as one component, with 40 points. Note: the final report must include a Github repository that links to your implementation of your project. We will check your implmentaiton as well. Missing the Github link will lead to zero grade of the final report.

Topics

Any project using probabilistic learning as a critical step or component will be fine.

If you are looking for ideas of possible projects, come to the office hours and we can brainstorm ideas. Projects can be one of:

  1. An application project, e.g., some machine learning application that you feel interesting.

  2. Reproduction of published results, e.g., you are interested in one machine learning paper and want to reimplement their model/algorithm to reproduce their experimental results.

  3. A theoretical project, e.g., prove interesting properties of a learning algorithm.

  4. An algorithmic project, e.g., develop a new learning algorithm for a particular type of problem.

  5. Your own research, e.g., if you are already working on some project and wish to apply machine learning methods.

In general, choose topics that you feel exciting, and convince me that the topic is important/interesting.

Important: Experimental evaluations should be rigorous, i.e., choose fair baselines, apply cross-validation for hyper-parameter selection, and report both positive and negative results.

Project Examples

  • Kaggle competition tasks
  • Using Stable-Diffusion to build a fun image-text generating library
  • Biology and medical study: can we select genes relevant to some disease, such as breast cancer or Alzheimer's Disease?
  • Commodity recommendation: can we use customer's purchase records to recommend commodities to old and new customers?
  • Software and security: can we identify Android apps with malwares?
  • Sentiment analysis: can we classify whether a piece of comment is positive or negative?
  • Spam emails detection.
  • ...