4.2. Data CleaningΒΆ

Data cleaning is an important process to ensure quality, representativeness, and unbiased.

The data cleaning process considers the following steps:

  • selection of clean subsets of the data

  • the insertion of suitable defaults

  • estimation of missing data by modeling

It is important to provide a clean data report containing:

  • decisions and actions that were taken to resolve data quality issues

  • the data transformations that took place in this step

  • possible impact on the result of the project analysis