difference between correlation and independence, k-means clustering pros and cons

Question

Anonymous · Accepted Answer

correlation : e[xy] - e[x]e[y], can be zero even if variables are not independent, usually can set up tricky rv that satisfies this
independence: p(x,y) = p(x)p(y). automatically implies 0 cov (plug it in)

k-means:
pros: good when you know number of distinct clusters without too much overlap between. run-time calculation is p fast, just compare to centoids O(num_means * num_dimension). interpretable and can use custom distance functions.
cons:  needs distance function, hard when data is on differing magnitudes. training is always  approximation, has to be trained, optimal solution is np-hard. training doesn't always converge, bad initial points can make clusters bad, hard to tell how many clusters is sufficient, cannot model complex clusters (think clusters of concentric rings)

C3 AI

C3 AI Interview Question

Interview Answer

Want the inside scoop on your own company?

Bowls

Followed companies

Job searches