Document AI
Extract data from unstructured PDFs
Document AI enables you to extract structured data from unstructured data sources such as PDFs. With this feature, you can extract data such as names, dates, locations, and other relevant information from large volumes of unstructured data with ease.
Example use-cases
Small business
You have a stack of invoices. You need to extract information like customer names, addresses and payment details.
Market research
You have a stack of reports. You need to extract information like customer demographics, purchasing behaviour, and market trends.
Document AI lets you specify the data you need, then automatically extracts it from your documents with high accuracy. This saves you a vast amount of time and effort.
How to use: Dashboard
Document AI can be used via the dashboard.
Documentation coming soon
How to use: API
For advanced use-cases (i.e. bulk analysis), an API is available.
This guide will walk you through using the Relevance AI API to conduct bulk analysis of PDFs using Document AI. By following these steps, you'll be able to extract data from multiple PDFs and receive the results in a structured format.
Prerequisites
- Access to cloud.relevanceai.com/sdk/api to obtain an API key.
Haven't signed up yet for Relevance AI? Sign up here! - A list of URLs to the PDF documents you want to analyse.
Steps
1. Obtain Your API Key
Go to the SDK page in the dashboard and copy your API key.

2. Trigger the Workflow
To trigger the Document AI workflow, send a POST request to the following endpoint:
POST https://api-f1db6c.stack.relevanceai.com/latest/workflows/trigger
Request Headers
Name | Description |
---|---|
Authorization | The API key you obtained in Step 1. |
Request Body Schema
Name | Type | Description |
---|---|---|
workflow_id | string | Set this to document_ai . |
params | object | An object containing the following keys: files , columns , n_rows and send_email . |
Params
Name | Type | Description |
---|---|---|
files | string[] | An array of URLs to the PDF documents you want to analyse. |
columns | string[] | An array of strings representing the headers or data you want to extract from the documents. |
n_rows | number | The number of items you want to extract from each document. |
send_email | boolean | Set to true if you want the workflow to send an email to the account holder when it is complete. |
JavaScript Example
async function triggerWorkflow(apiKey, files, columns, nRows, sendEmail) {
const endpoint = 'https://api-f1db6c.stack.relevanceai.com/latest/workflows/trigger';
const requestBody = {
workflow_id: 'document_ai',
params: {
files: files,
columns: columns,
n_rows: nRows,
send_email: sendEmail
}
};
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': apiKey
},
body: JSON.stringify(requestBody)
});
const data = await response.json();
return data.job_id;
}
3. Poll the Job Status
After triggering the workflow, you'll receive a job_id
in the response. Use this job_id
to poll the job status by sending a POST request to the following endpoint:
POST https://api-f1db6c.stack.relevanceai.com/latest/workflows/{job_id}/get
Request Headers
Name | Description |
---|---|
Authorization | The API key you obtained in Step 1. |
JavaScript Example
async function getJobStatus(apiKey, jobId) {
const endpoint = `https://api-f1db6c.stack.relevanceai.com/latest/workflows/${jobId}/get`;
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': apiKey
}
});
const data = await response.json();
return data;
}
4: Retrieve the Results
Once the job status is complete
, the results can be found in the output
key of the returned status object, under the results
key. This will be an array of objects with keys corresponding to the headers you wanted to extract and values corresponding to the data extracted. A link to the generated CSV can be found in the email
key, under secondary_cta.url
.
JavaScript Example
async function main() {
const apiKey = 'your-api-key';
const files = ['url1', 'url2', 'url3'];
const columns = ['header1', 'header2', 'header3'];
const nRows = 10;
const sendEmail = false;
const jobId = await triggerWorkflow(apiKey, files, columns, nRows, sendEmail);
console.log(`Job ID: ${jobId}`);
let jobStatus = await getJobStatus(apiKey, jobId);
while (jobStatus.status !== 'complete') {
console.log(`Job status: ${jobStatus.status}`);
await new Promise(resolve => setTimeout(resolve, 5000));
jobStatus = await getJobStatus(apiKey, jobId);
}
console.log('Job complete');
console.log('Results:', jobStatus.output.results);
console.log('CSV URL:', jobStatus.email.secondary_cta.url);
}
main().catch(error => {
console.error('Error:', error);
});
Updated 4 months ago