1. Give an example of a dataset consisting of four data vectorswhere there exist two different optimal (minimum Sum of SquaredErrors) 2-means (k = 2) clustering of the dataset.
a. Calculate the optimal SSE value for your example.
b. What defines the number of optimal solutions?
2. a. Given k clusters and their respective cluster sizes s1, s2, .. . , sk, what is the probability that two random (withreplacement) data vectors (from the clustered dataset) belong tothe same cluster?
b. Now, assume you are given this probability (you do not havesi ‘s and k), and the fact that clusters are equally sized, can
OR
OR