How To Choose The Optimal Number Of Clusters

There are many factors involved in identifying the optimal number of clusters. Data scientists often employ techniques such as the Elbow method or metrics such as the Silhouette Coefficient to decide on the best number of clusters, however, even under those methods, it is very difficult to identify the exact number of categories. This applies to algorithms such as K-Means which require the user to input the number of clusters in advance.

On the other hand, some clustering algorithms claim that they automatically find the optimum number, but this is heavily dependent on the data.

In practice, what we recommend and have found useful is to

  1. understand the data as much as possible to get an idea of the topics
  2. try different numbers of clusters and quickly check the results under the Explorer dashboard
  3. increase or decrease the number of clusters based on step 2
  4. employ the merge functionality of the Explorer dashboard to combine clusters that are conceptually close to one another

Note: Step 2 can include different clustering algorithms (e.g Auto-cluster and KMeans) as well as different numbers of clusters in one algorithm (e.g. KMeans)