Cluster - Hybrid

Clustering groups items so that those in the same group/cluster have meaningful similarities (i.e. specific features or properties). Clustering facilitates informed decision-making by giving significant meaning to data through the identification of different patterns.

πŸ“˜

Why clustering data can be beneficial?

Clustering groups items so that those in the same group/cluster have meaningful similarities. Thus, clustering is a great tool to unravel hidden patterns in the data.

How to use hybrid-clustering on Relevance AI's platform

This clustering workflow combines different methods of clustering to group entries in a dataset.

Note: The accuracy of this technique is highly dependent on the data.

Once you have uploaded, vectorized, select your dataset and locate "Cluster- Hybrid" under Workflows.

Relevance AI - Access to hybrid clustering

Relevance AI - Access to hybrid clustering

Follow the steps in the setup wizard:

  • Select the text which to be used for clustering
  • Select the vector field to be used for clustering
  • Select your desired method for clustering (Kmeans, DBScan or both combined)
    Note: K-means Clustering is more efficient for large datasets. DBSCan Clustering can not efficiently handle high dimensional datasets since it discard items that are far from the rest of the data points.
Relevance AI - Cluster-hybrid setup

Relevance AI - Cluster-hybrid setup

  • specify the minimum and maximum number of clusters
  • Type in a name for the new field which gets added to your dataset to store the results
  • Execute the workflow

When the workflow is finalized, go to Datasets and you can see the resulting field is added to your dataset.