Audio Use Case

A guide to implementing Relevance AI's audio processing use case, including data structure and sample questions.

Problem Statement

Summarizing and understanding what was talked about, is often key to unlocking insight from audio data (e.g. focus groups, phone calls). Identifying emerging themes in the conversation(s) often forms part of strategic initiatives required for executive reporting - such as in the context of customer or employee feedback projects/surveys - or ongoing business operations, such as in the context of customer service and support teams, requiring insight to be presented at team standups or weekly huddles.

Who Are the Typical Editors and Viewers of This Use Case?

  • Editors: Market Researchers, Insight Analysts, Business Analysts, Data Analysts
  • Viewers: Insight Managers, Research Managers, Project Managers, Team Leaders, Customer Teams, Chief Experience Officers, Chief Customer Officers

How Does It Work?

Audio files are processed using AI and results are automatically saved in your account. Key results are audio transcription, speaker diarization, sentiment analysis of spoken sentences, audio chapters, keywords, gist and summary extraction.

Example output produced by Relevance AI

Data to analyze: Audio files on interviews "What do you think about interest rate rises?"

Having 3 interview audio files with multiple speakers (i.e. a moderator and interviewees), we can process the audio and analyze the data for key themes and insights. An example output is shown in the image below. Here, we see, audio transcription along with sentence splitting, keyword highlighting, speaker diarizsention (i.e. speaker identification), sentiment and entity analysis.

Relevance AI - A sample dataset composed of sentiment audio processing results

Relevance AI - A sample dataset composed of sentiment audio processing results

Another example output is shown in the image below. Where the key quotes are highlighted and themes are automatically assigned.

Relevance AI - Interview AI's dashboard

Relevance AI - Interview AI's dashboard

Relevance AI transcribes and labels my audio; what else will it extract?

Transcribing audio, automatically extracting the main quotes and labelling speakers and themes free us from the need to repeatedly listen to the files and take notes. However, this is not the end. The next and possibly, the most important step is to extract insight from the extracted information. Some of the very informative fields are:

  • Quote or Text Field - Transcription
    A text field, containing what was talked about. This includes the moderator and interviewees/guests. You can further tag or cluster this data.
  • Speaker label
    We might want to only focus on what the interviewees/guests said. We can filter out the moderator. Or if we are interested only on one speaker, we can filter everyone else out.
  • Themes
    Automatically assigned themes based on the interview
  • Highlights
    Keywords identifying the general themes
  • Sentiment
    A key identifier of positive, neutral or negative sentiment. Typically used to filter/drill down into potential areas of concern or improvement.
  • Entity
    In some cases, there is a high value in identification of entities (human, places, etc.) that have been mentioned in a conversation
  • Summary
    When breaking the input audio into chapters, you will have a summary field as well. It is also possible to cluster or tag the summary instead of working on the full transcription
  • Gist
    When breaking the input audio into chapters, you will have a gist field as well. This field is another great identifier of the theme.

What Sort Of Insights Can Relevance AI Help Me Uncover?

Identify Key Themes

  • Understand emerging themes driving customer feedback / satisfaction
  • Summarize open-ended responses with both high-level tags and granular sub-tags

Understand Feedback Patterns

  • Filter by customer sentiment: positive, negative, neutral
  • Pinpoint feedback by emotion: e.g. anger, disappointment, frustration

Uncover New Opportunities

  • Understand satisfaction across key customer demographics
  • Satisfaction by channels, products, regions

How To Get Started: Audio Use Case

A brief overview of the main steps is provided below:

  1. Save your audio file(s) in one of the common audio formats - mp3 is recommended
  2. Your audio file must be accessible via a http... link. Use your preferred hosting method, include the URL(s) in your CSV and upload your csv file to Relevance AI or simply Upload Media which takes care of this step
  3. Analyze your audio file(s) via Transcribe Audio and Identify Speakersor Interview AI
  4. Further analyse the extracted text fields. For example, running AI Clustering, AI Tagging; clustering categorizes what was talked about into different groups, whereas tagging labels the data based on selected code frames
  5. Build and extract insights using Relevance AI's Explorer dashboard


  • Save your audio file in a common audio format such as mp3
  • Make sure the moderator is the first person heard in the audio, so speaker A is always the moderator/interviewer. This helps filter your data better
  • Make sure people do not speak over each other when recording the audio

Related Articles

How to prepare data

How to upload data

How to Upload Media

Quick guide to find and create insights

AI Clustering

AI Tagging

Explorer dashboard

Common questions

How do you deal with accents?

The AI model used for Audio Transcription is trained on a vast range of accents. Therefore, the expectation is to get high quality transcription as long as the accent is understandable for a general English speaker.

Should I remove the moderator?

You do not need to remove the moderator as long as you make sure the first person heard in the audio file is the moderator (Speaker-A). When the moderator is labeled as Speaker-A, you can filter all associated transcriptions out (if necessary).

How do I find insights/themes from the 'group' VS 'split out by individual speaker?

As long as the audio has a high quality and speakers do not talk over each other, the model should be able to transcribe well. The next step is to analyze the data. When analyzing, you have the choice of filtering some speakers out (i.e. looking at the insights across specific people) or keep everyone it (i.e. looking at the insights representative of the group).

What if I have the audio transcribed in a word doc?

You can definitely use your own transcription. However, depending on what workflow you wish to use, you need to save it under a required format. For instance, under Interview AI, apart from common audio and video formats, you can upload your data as a PDF or a CSV file.

Below is a small sample of how to transform your transcript from words format to a CSV. Your transcript as a document (e.g. Word document) may look like this.

Speaker 1
Hello. Nice to finally e-meet you
Speaker 2
Hi. Nice to meet you too.
Speaker 1
Lets start with your ...

The CSV equivalent will be

Speaker | Text
  1     | Hello. Nice to finally e-meet you
  2     | Hi. Nice to meet you too.
  1     | Lets start with your ...