DR+Clustering

From ResearchWiki

(Difference between revisions)
Jump to: navigation, search
(Leader Board)
(CS 6150: Graduate Algorithms Project)
Line 11: Line 11:
''"Well, it's not even difficult. All I do is visualize the situation in arbitrary N-dimensional space and then set N = 13."''
''"Well, it's not even difficult. All I do is visualize the situation in arbitrary N-dimensional space and then set N = 13."''
 +
[[File:Cube.jpg|link=http://yaroslavvb.com/research/reports/curse-of-dim/pics/cube.gif]]
'''And Clustering is ''hard''.'''
'''And Clustering is ''hard''.'''

Revision as of 07:07, 5 October 2012

Contents

CS 6150: Graduate Algorithms Project

High dimensions are weird.

A mathematician and his best friend, an engineer, attend a public lecture on geometry in thirteen-dimensional space.

"How did you like it?" the mathematician wants to know after the talk.

"My head's spinning", the engineer confesses. "How can you develop any intuition for thirteen-dimensional space?"

"Well, it's not even difficult. All I do is visualize the situation in arbitrary N-dimensional space and then set N = 13."

File:Cube.jpg

And Clustering is hard.

Although, Amit Daniely, Nati Linial, Michael Saks say its only hard when it does not matter!

Goal

Understand the impact of dimensionality reduction methods on clustering. Try to uncover relationship between a dimensionality reduction method and a clustering technique of your choice (if there exists any).

Data

1. MNIST Digits data on Sam Roweis's data page

2. Gisette on UCI repository

3. Olivetti Faces on Sam Roweis's data page

4. Dorothea on UCI repository

5. A fifth dataset will only be revealed later to add spice to the contest.

Leader Board

Data # Data points # Dimensions # Clusters Team Name # Target Dimensions Dimensionality Reduction Method Clustering Technique Rand Index NMI Accuracy
MNIST training data 60000 784 10
Gisette 13500 5000 2
Olivetti Faces 400 4096 40
Dorothea 1950 100000 2
Personal tools