When working with textual data it is recommended (i.e. not required) to apply certain preprocessing steps which can potentially improve the analysis results. Common text pre-processing are:
- Stop words removal: to remove frequent but not important words used in our language (e.g. the, there).
- Stemming: replacing words with their word stem (e.g. changes or changing become chang-)
- Lemmatization: replacing words with their common root (e.g. changes or changing become change)
- Lowercasing: converting all characters to their lowercase form
- Text cleaning: this step is completely data specific. Some famous text cleanings are Html, URL or hashtag removal.
- Breaking into shorter pieces of text: when automatically analyzing text, processing smaller pieces of text (e.g. a sentence vs paragraph) often produces more precise results.
Read more about preparing and uploading media files
Updated 5 months ago