AI Text Clustering

Your one-stop-shop to transform text data to insight

When dealing with free text data, there are various processing steps to transform raw text data to insight. The AI Text Clustering workflow comprises the most common and most beneficial processing steps (i.e. vectorizing, clustering, sentiment analysis and the count workflow). It takes care of all the required steps. So all you need to do is follow the setup wizard!

πŸ“˜

AI Clustering

This workflow categorizes your text data and presents the results on the Explorer app. It provides you with optional processing steps such as sentiment analysis. All you need to do is follow the setup wizard.

Note: There are two variants to AI-Clustering:

  • One-to-One
    All entries in a dataset are processed. Themes are identified based on conceptual similarities of the whole text. Each entry is assigned to only one of the themes. This is called one to one clustering.
    Example:
    Sydney's weather and landscape is amazing => theme = Sydney
  • One-to-Many
    All entries in a dataset are processed. Themes are identified based on conceptual similarities of the sentences composing each text piece. Each entry is assigned to one or more of the themes. This is called one to many clustering.
    Example:
    Sydney's weather and landscape is amazing => ` theme = Sydney, weather, landscape

How to run AI Text Clustering

Once you have uploaded your data, select your dataset and locate "AI Text Clustering" under workflows. If it is not listed in the Overview page (i.e. under "Suggested workflows for this dataset"), one easy way to access a workflow is to search for it under Browse Workflows as shown in the image below.

Relevance AI - Access to AI-Clustering workflow

Relevance AI - Access to AI-Clustering workflow

This will open the setup wizard:

Relevance AI - AI clustering setup

Relevance AI - AI clustering setup

On this page you will

  1. Select the free text field that you wish to analyse
  2. Select the variant: One-to-One or One-to-Many
    Note that the latter will take longer to finish
  3. Enter the number of categories you wish to see
    Note 1: this number identifies to how many groups your data will be broken
    Note 2: smaller numbers result in high-level overview of the data, whereas larger values will break the data into more groups
    Note 3: if you have an overview of the data, knowing there are N categories (e.g. you know there are roughly 45 categories in a customer feedback dataset) you should enter N, otherwise we recommend 5% of the size of your dataset (i.e. number of entries in the dataset). Read our guid on How to select the number of clusters
  4. You can select to run some other very beneficial processings steps alongside clustering:
    • Sentiment analysis: identify the polarity of the text data
    • Count character, word and sentence: add metrics on text statistics to your data
  5. Optional Settings:
    1. Identify category field(s) to be used when presenting data on the Explorer dashboard. AI categorized data entries can be further grouped by fields such as gender, nationality, department, state, etc. This helps to better understand the data.
    2. Identify numeric field(s) to be used when presenting data on the Explorer dashboard. AI categorized data entries can be further analysed based on factors such as average age or average NPS score, etc. This helps to better understand the data.

Finally click on "Run workflow" to activate it.

You will be directed to the History page where you can see a list of all workflows/transformation/analysis applied to the selected dataset. Wait for "AI Text Clustering" to complete and then use the "View results in dashboard" button to see the results in an auto-generated insight dashboard. Read more about Explorer features and how to personalize it.

When all steps are finalized, you will receive and email notification.

Note 1: You do not need to keep the page open while the workflow is in progress. You can close the window or explore other functionalities of the platform.

Note 2: You can run multiple workflows in parallel (i.e. no need to wait for one to finalize).

Note 3: Workflow results are saved back to the dataset.

Note 4: Workflow results are independent of each other meaning they do not overwrite each other unless a workflow is run twice with the exact same workflow setup.

Relevance AI also provides you with another one-to-many grouping in theme identification. See our guid on AI Tagging.