Document AI

Extract data from unstructured PDFs

Document AI enables you to extract structured data from unstructured data sources such as PDFs. With this feature, you can extract data such as names, dates, locations, and other relevant information from large volumes of unstructured data with ease.

πŸ‘

Example use-cases

Small business

You have a stack of invoices. You need to extract information like customer names, addresses and payment details.

Market research

You have a stack of reports. You need to extract information like customer demographics, purchasing behaviour, and market trends.

Document AI lets you specify the data you need, then automatically extracts it from your documents with high accuracy. This saves you a vast amount of time and effort.

How to use: Dashboard

Document AI can be used via the dashboard.

Documentation coming soon

How to use: API

For advanced use-cases (i.e. bulk analysis), an API is available.

This guide will walk you through using the Relevance AI API to conduct bulk analysis of PDFs using Document AI. By following these steps, you'll be able to extract data from multiple PDFs and receive the results in a structured format.

Prerequisites

  • Access to cloud.relevanceai.com/sdk/api to obtain an API key.
    Haven't signed up yet for Relevance AI? Sign up here!
  • A list of URLs to the PDF documents you want to analyse.

Steps

1. Obtain Your API Key

Go to the SDK page in the dashboard and copy your API key.

A screenshot of cloud.relevanceai.com/sdk/api

2. Trigger the Workflow

To trigger the Document AI workflow, send a POST request to the following endpoint:

POST https://api-f1db6c.stack.relevanceai.com/latest/workflows/trigger

Request Headers

NameDescription
AuthorizationThe API key you obtained in Step 1.

Request Body Schema

NameTypeDescription
workflow_idstringSet this to document_ai.
paramsobjectAn object containing the following keys: files, columns, n_rows and send_email.

Params

NameTypeDescription
filesstring[]An array of URLs to the PDF documents you want to analyse.
columnsstring[]An array of strings representing the headers or data you want to extract from the documents.
n_rowsnumberThe number of items you want to extract from each document.
send_emailbooleanSet to true if you want the workflow to send an email to the account holder when it is complete.

JavaScript Example

async function triggerWorkflow(apiKey, files, columns, nRows, sendEmail) {
  const endpoint = 'https://api-f1db6c.stack.relevanceai.com/latest/workflows/trigger';

  const requestBody = {
    workflow_id: 'document_ai',
    params: {
      files: files,
      columns: columns,
      n_rows: nRows,
      send_email: sendEmail
    }
  };

  const response = await fetch(endpoint, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': apiKey
    },
    body: JSON.stringify(requestBody)
  });

  const data = await response.json();
  return data.job_id;
}

3. Poll the Job Status

After triggering the workflow, you'll receive a job_id in the response. Use this job_id to poll the job status by sending a POST request to the following endpoint:

POST https://api-f1db6c.stack.relevanceai.com/latest/workflows/{job_id}/get

Request Headers

NameDescription
AuthorizationThe API key you obtained in Step 1.

JavaScript Example

async function getJobStatus(apiKey, jobId) {
  const endpoint = `https://api-f1db6c.stack.relevanceai.com/latest/workflows/${jobId}/get`;

  const response = await fetch(endpoint, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': apiKey
    }
  });

  const data = await response.json();
  return data;
}

4: Retrieve the Results

Once the job status is complete, the results can be found in the output key of the returned status object, under the results key. This will be an array of objects with keys corresponding to the headers you wanted to extract and values corresponding to the data extracted. A link to the generated CSV can be found in the email key, under secondary_cta.url.

JavaScript Example

async function main() {
  const apiKey = 'your-api-key';
  const files = ['url1', 'url2', 'url3'];
  const columns = ['header1', 'header2', 'header3'];
  const nRows = 10;
  const sendEmail = false;

  const jobId = await triggerWorkflow(apiKey, files, columns, nRows, sendEmail);
  console.log(`Job ID: ${jobId}`);

  let jobStatus = await getJobStatus(apiKey, jobId);

  while (jobStatus.status !== 'complete') {
    console.log(`Job status: ${jobStatus.status}`);
    await new Promise(resolve => setTimeout(resolve, 5000));
    jobStatus = await getJobStatus(apiKey, jobId);
  }

  console.log('Job complete');
  console.log('Results:', jobStatus.output.results);
  console.log('CSV URL:', jobStatus.email.secondary_cta.url);
}

main().catch(error => {
  console.error('Error:', error);
});