Data cleansing is also called by the names data cleaning and data scrubbing. During data entry there can be errors, wrong data, invalid entries, or mistakes like in spelling etc. When data is integrated from different sources in a data warehouse, or a database system, or on the Internet, there is always a requirement to clean data so that it is consistent and correct. Data cleaning process is thus a necessity to maintain a good level of data quality in any environment.
The data cleansing process starts with detection of errors and inconsistency in any data so that the overall quality of data is increased and improved. This can be done manually or using data analysis programs. Data samples are also taken for analysis from the database. Data cleansing mainly involves removing duplicate data and identifying and rectifying of missing information. Data cleaning is not done in isolation but has to be done along with data transformation. The process of ETL (extraction, transformation and loading) precedes formation of a data warehouse. Often a schema translation process is done to tag data to a common model. Data cleaning will have to be done for data got from either single or multiple sources and the cleaning methodologies work flow is mapped and designed accordingly.
Data verification is done to check the correctness of the previous work flow. This is followed by data transformation and backflow of data that replaces the legacy system with the new clean data. Data cleaning is done using many different types of tools. These may include,
- ETL tools
- Data analysis tools
- Data mining tools
- Data reengineering tools
- Duplicate elimination tool
- Domain cleaning tool
- Name/ Address cleaning tool
Managed Outsource Solutions (MOS) is US based data entry solution provider and outsourcing services provider that offera arrange of services like, data entry, data mining, data conversion, data capture, data processing, data cleansing and document management.