Data cleansing is also known by the names data cleaning and data scrubbing. However it is not the same as data validation. While basically data cleansing is about removing incorrect data from a data base, data validation is a much earlier process of rejecting irrelevant data even before it is entered into the database.
Quite often the cause of inconsistency in a data set is because database has duplicated entries. Data cleaning is sometimes done manually and also in an automated fashion using data cleansing software. In a database there can be either,
- Inconsistent data
- Missing data
- Incomplete data
- Duplicative data
- Incorrect data
During automated data cleansing, one finds the use of different data cleaning methodologies like statistical outlier detection, pattern matching, clustering, and data mining techniques. The overall process of data cleansing can be summarized in the following procedure.
- Identify authoritative data sources
- Measure data quality
- Use discovery tools to identify the bad data
- Use data cleansing tools/ software
- Only clean data must be entered in data warehouse
- Identify the cause of data corruption
- Correct the cause of data defects and corruption
- Periodic cleansing of the source data is scheduled
Managed Outsource Solutions is a US based data entry services outsourcing company that offers professional data cleansing services besides services in data entry, data processing, data capture, data extraction, data mining, KDD and data validation services to clients in the US, Canada, the UK and Australia.