How to update an existing dataset

In various situations, one might need to make changes to an existing dataset. Common modifications are:

  • Adding new entries (i.e. rows) to an existing dataset. For instance, when a survey is updated periodically (e.g. quarterly tracking projects) and new responses are to be added to an existing dataset
  • Adding new fields (i.e. columns) to an existing dataset. For instance, when you want to add extra information per entry (e.g. date, gender, address, new responses), adding these extra columns to your dataset for analysis
  • Modifying existing values in a dataset. For instance, when there has been a mistake in some values or values under a field have changed (e.g. updates to an address field).

Adding new items (i.e. rows) to an existing dataset

This is a very simple task including the following steps:

  1. prepare your CSV file the same way that you did initially (i.e for the existing dataset)
    Note: For the new data to sit properly under the headers in the existing dataset, the headers in your first CSV file and the second CSV file must be exactly the same.
  2. On the platform, select the existing dataset to which you wish to add new data
  3. Click on "Add data" at the top of the page
Relevance AI - How to view my data and access Add to a dataset

Relevance AI - How to view my data and access Add to a dataset

🚧

Headers must be EXACTLY the same between data batches for the new data to sit properly in an existing dataset

When updating an existing dataset with a new batch of data, headers must be exactly the same between the old and the new CSV. Otherwise new columns will be added to your dataset.

Keep in mind that the platform is case-sensitive. For example "Name" and "name" are considered as two different headers.

  1. Drag and drop the new CSV file and your new entries will be added to the existing dataset

What could be next?

When adding entries to an existing dataset, there is a high chance that the existing dataset has already been tagged or clustered. In case, you do not want to start from scratch, you can apply any of the followings to the new data.


Modifications

There is a unique identifier per entry (_id) in datasets on the Relevance AI's platform. The _id field can preexist in a CSV (i.e. included in the to-be-uploaded CSV file). Otherwise, the platform automatically adds the field with unique values.

This id field is your access point to modify exiting entries in a dataset.

πŸ“˜

The _id field is your access point to an individual entry in a dataset.

Either include an _id field with unique values per entry in your CSV file when uploading a dataset, or use the export functionality to access the assigned ids.

Adding new fields (columns) to an existing dataset

  1. Prepare a CSV file that includes an _id header/column and the new field(s) you wish to add to your dataset. The values under _id must be equal to the id values associated to the existing entries that you wish to update. Use the export feature to access existing id values and modify your exported CSV then upload it.
    In the example below, an existing dataset with 3 entries is shown. Each entry has an id and two fields (Col1 and Col2). We wish to add two new fields (Col3, Col4) to the dataset. This can be easily done by uploading a CSV similar to what is shown under "Data to update".
Existing Dataset            Data to update      

_id | Col1  | Col2        _id | Col3  | Col4
------------------       --------------------
  1 |  V1   |  V4          1  |  V7   |  V9
------------------       --------------------
  2 |  V2   |  V5          2  |       |  V10
------------------       --------------------
  3 |  V3   |  V6          3  |  V8   |  
    
 
         Resulting Dataset
         
_id | Col1  | Col2 | Col3  | Col4
--------------------------------------
  1 |  V1   |  V4  |  V7   |  V9
--------------------------------------
  2 |  V2   |  V5  |       |  V10
--------------------------------------
  3 |  V3   |  V6  |  V8   |  
  1. On the platform, select the existing dataset to which you wish to add new data
  2. Click on "Add data" at the top of the page.

  1. Drag and drop the new CSV file and your new columns will be added to the existing dataset.

Modifying existing values in a dataset

  1. Prepare a CSV file that includes an _id header/column and the field(s) you wish to modify in your dataset. The values under _id must be equal to the id values associated to the existing entries that you wish to update. Use the export feature to access existing id values and modify your exported data then upload it.
    In the example below, an existing dataset with 3 entries is shown. Each entry has an id and two fields (Col1 and Col2). We wish to modify Col1 in the second entry and Col2 in the third entry.
Existing Dataset            Data to update      

_id | Col1  | Col2        _id | Col1  | Col2
------------------       --------------------
  1 |  V1   |  V4          2  |  V7   |  
------------------       --------------------
  2 |  V2   |  V5          3  |       |  V10
------------------      
  3 |  V3   |  V6        
    
 
Resulting Dataset
         
_id | Col1  | col2 
-------------------
  1 |  V1   |  V4  
-------------------
  2 |  V7   |  V5  
-------------------
  3 |  V3   |  V10  
  1. On the platform, select the existing dataset to which you wish to add new data
  2. Click on "add data" at the top of the page.
  3. Click on "Upload". Drag and drop the new CSV file and the specified values will be updated.

Note 1: You do not have to update all entries. Only, include the _id values for entries that you wish to update (e.g. as can be seen the first id in the above example is not included for update).

Note 2: If a cell is empty in your new CSV file, no modification is applied to its associated entry in the dataset (e.g. Col1 for the third entry in the above example).