Analyze Audio File(s)

Step-by-step guide on analysing audio files

This page will explain how to transcribe audio files on Relevance AI and apply text analysis such as clustering and tagging to the transcription to extract insight.


Audio Core

A workflow which provides you with

  • audio transcription
  • speaker diarization
  • utterances

How to use the audio workflow

1. Upload your dataset

You can upload your audio file(s) to the platform using the Upload media. Alternatively, if you have already stored your audio files on the web (i.e. you can access them via an http... URL), include their URLs in a CSV file (as shown here) and upload the CSV to Relevance AI.

Note: When uploading your media files to Relevance AI, if your audio files are large, allow time for the upload process to finalize.

As a result, you will have a dataset in which each entry represents an audio file including a URL to where the file is uploaded.

2. Transcribe your audio file(s)

Select your audio dataset, go to workflows and locate Audio Core. After setting up the workflow wizard and successfully running the workflow, you will have a new dataset called <Original-Dataset-Name>_utterance in your account. For instance if the original dataset name is audio_dataset, a new dataset named audio_dataset_utterance will be added to your account.

This new dataset included the following main fields/columns:

  • Text: transcription of the audio file(s)
  • Speaker: A, B, C, etc. labels assigned to the voices heard in the audio
  • Start: time in the audio indicating the beginning of a spoken piece (Utterance)
  • End: time in the audio indicating the end of a spoken piece (Utterance)
  • File Name: original file name

Note 1: Transcription can take long depending on the size of your audio file (for example 1.5 hours of audio takes around 20 minutes)
Note 2: You might need to refresh the page for <Original-Dataset-Name>_utterance to appear under Datasets.



Even though audio analysis result is written back to the original dataset, for further processing of your data, use the new dataset. It can be found under <Original-Dataset-Name>_utterance.

3. What is next

Some common next steps are:

  • Process the Text field (i.e. the transcription) for insights. This is similar to any text processing. Select <Original-Dataset-Name>_utterance dataset and apply AI Clustering or AI Tagging to the Text field.
  • Analyse data and visualize the insight on the Explorer dashboard
  • Export your data: This can be done directly through an export workflow or on the Explorer dashboard. For the latter, set up a categorical view, you can choose any categories such as the Speaker field. then use export which generates a CSV file including the fields you select to download.

Note 1: utterances in the downloaded file might not be in order. We recommend including the Start field in your export, so you can sort the data accordingly.

Note 2: If you are working with multiple audio files, you can use the File Name field to separate the transcriptions.