Data cleaning is an essential process for maintenance of the data quality in a data warehouse. Whenever data from outside sources has to be incorporated with an existing database, it has to be first cleaned and made fit enough to be added to the old data as only an updated data can become reliable. ETL (Extract, Transform, Load ) tools are used to do data cleaning.
Data cleaning/cleansing is done using ETL tools that basically have to extract data from archival/ operational systems and clean and validate the data according to the specific business requirements followed by loading it into an existing database/ data warehouse. Some of the common ETL tools available in the market today include,
- Oracle Warehouse builder – http://www.oracle.com/technology/products/warehouse/index.html
- Microsoft SQL Server Integration Services (SSIS) – http://msdn2.microsoft.com/en-us/sql/Aa336312.aspx
- IBM Websphere DataStage – http://ibm.ascential.com/products/datastage.html
- Cognos Decisionstream – http://www.cognos.com/products/tour/decisionstream
- Informatica PowerCenter – http://www.informatica.com/products/powercenter/default.htm
What are the functions of an ETL tool? An ETL tool has to have either an advanced extraction capacity or a transformational function. A good tool will have good strength in both these functions. Also the data source varies from one company to another. While choosing of ETL tools care should be taken to ensure that the tool directly connects to the source of the data. An ETL tool must also have a good scheme of metadata.
Managed Outsource Solutions (MOS) is a US based data entry and data mining company that offers a wide range of services including, data entry, data processing, data conversion, data cleansing, online data entry, KDD, image management, web extraction, data analysis, legal data entry, ICR services, OCR services and data mining.