Talk:Algorithms Seminar/Spring09

From ResearchWiki

(Difference between revisions)
Jump to: navigation, search
(23 Jan)
m
Line 8: Line 8:
: My feeling is that the k-means algorithm might fare poorly if there are substantial differences between the original cluster sizes (or in general between the cluster distributions). Say, for example, the data has one very large and two reasonably small clusters (and assume they all are spherical-shaped clusters). The way k-means partitioning works (i.e. minimizing sum of square distances from the cluster centers) might make the algorithm "split" the large cluster and assign some of its points to the smaller clusters. If this does happen, this would clearly not be the correct clustering. Are there ways to circumvent such a problem in k-means?
: My feeling is that the k-means algorithm might fare poorly if there are substantial differences between the original cluster sizes (or in general between the cluster distributions). Say, for example, the data has one very large and two reasonably small clusters (and assume they all are spherical-shaped clusters). The way k-means partitioning works (i.e. minimizing sum of square distances from the cluster centers) might make the algorithm "split" the large cluster and assign some of its points to the smaller clusters. If this does happen, this would clearly not be the correct clustering. Are there ways to circumvent such a problem in k-means?
-
'''John M'''
+
'''John Meier'''
: What is a hypersphere/hyperplane?
: What is a hypersphere/hyperplane?
: How does superpolynomial time differ from non-deterministic polynomial time?
: How does superpolynomial time differ from non-deterministic polynomial time?
: It seems like k-means++ offers several improvements over traditional k-means without requiring any significant tradeoffs. Has k-means++ become the norm in applications that been using the original algorithm?
: It seems like k-means++ offers several improvements over traditional k-means without requiring any significant tradeoffs. Has k-means++ become the norm in applications that been using the original algorithm?
: Are there any methods for adaptively changing k, the number of clusters, during a run of the algorithm? (e.g., increment k and try again if the maximum point-to-centroid distance in any cluster is greater than a threshold)
: Are there any methods for adaptively changing k, the number of clusters, during a run of the algorithm? (e.g., increment k and try again if the maximum point-to-centroid distance in any cluster is greater than a threshold)

Revision as of 23:44, 23 January 2009

23 Jan

Nathan

What does it mean for an algorithm to be O(X)-competitive?
What are some good methods to help make Kmeans robust regarding outliers?

Piyush

My feeling is that the k-means algorithm might fare poorly if there are substantial differences between the original cluster sizes (or in general between the cluster distributions). Say, for example, the data has one very large and two reasonably small clusters (and assume they all are spherical-shaped clusters). The way k-means partitioning works (i.e. minimizing sum of square distances from the cluster centers) might make the algorithm "split" the large cluster and assign some of its points to the smaller clusters. If this does happen, this would clearly not be the correct clustering. Are there ways to circumvent such a problem in k-means?

John Meier

What is a hypersphere/hyperplane?
How does superpolynomial time differ from non-deterministic polynomial time?
It seems like k-means++ offers several improvements over traditional k-means without requiring any significant tradeoffs. Has k-means++ become the norm in applications that been using the original algorithm?
Are there any methods for adaptively changing k, the number of clusters, during a run of the algorithm? (e.g., increment k and try again if the maximum point-to-centroid distance in any cluster is greater than a threshold)
Personal tools