Data cleaning, otherwise known as data cleansing or data scrubbing involves detecting and correcting corrupt, inaccurate, incomplete, and irrelevant records in a record set, table or database. The coarse data is then replaced, modified, or deleted. Partnering with experienced data cleansing companies help businesses maintain complete, accurate and up-to-data database. Microsoft Excel is the most popular software used to store valuable data. The stored data must be clean and reliable. By cleansing data, business enterprises can boost their customer acquisition efforts, improved decision making process, streamline business practices, save a lot of money, and ensure that employees are making the best use of their work hours. Clean and actionable data prevent the staff from contacting customers with out-of-date information or creating invalid vendor files in the system. Since the records are clean, it helps maximize the staff’s efficiency and productivity, and improves their response rates.
According to a survey conducted by Pharma Intelligence and sponsored by Oracle Health Sciences, lack of confidence in clinical trial data cause delay in drug development. Besides trial delays, 81 percent of respondents pointed out data governance issues as the biggest challenge in meeting regulatory compliance. The survey respondents said that the top three data issues were duplicate data/inconsistent data, data quality and data integrity/traceability. 51 percent cited data completeness, 45 percent said data quality and 43 percent, data cleaning are the top three operational challenges that were with the clinical trial data. Therefore, companies should focus more on improving quality of their data.
Some Techniques to Clean Data in Excel
The following are some of the techniques that can be used to clean data in Excel
- Remove Extra Spaces– Extra spaces between words can be removed with the help of trim function.
This function takes one single argument which could either be the text which you type manually or it could be the cell reference. Trim function removes all the leading spaces and trailing spaces and extra spaces between words except one single space that is allowed.
Select and treat all blank cells – If you want to fill blank cells with ‘0’ or ‘Not Available’ or simply want to highlight it, select the entire data set; press F5, this opens the “Go To” dialogue box and click on special button at the bottom left. This again opens the “Go To Special” dialogue box. Then click blank and click ok. This selects all the blank cells in your data set at the same time. Now you have these cells in grey and the first cell is in white as this is the active cell. Then start typing “0” or “Not Available” in all these cells and hit “Control + Enter”.
- Convert Numbers Stored as Text into Numbers – When data is imported from text files or external databases, numbers get stored as text. Some people use an apostrophe before a number to make it text. This can lead to errors if you are using these cells in calculations. Here is how to convert these numbers stored as text back into numbers. Type 1 in any blank cell and then select the cell where you typed 1 and press Control + C. After that, select the cell or range which you want to convert to numbers. Select Paste – Paste Special or press Alt + E + S. At that time “Paste Special” dialogue box appears and select “Multiply” from the box. Click OK and this converts all the numbers in text format back to numbers.
- Remove Duplicates – This involves highlighting duplicate data and deleting it. To highlight duplicate data, select the data and go to Home, select Conditional Formatting, click Highlight Cells Rules and select Duplicate Values. Specify the formatting and all the duplicate values get highlighted. To remove duplicates, select the data and Go to Data and click Remove Duplicates. If the data has headers, ensure that the checkbox at the top right is checked. Select the Columns from which duplicates have to be removed and click OK. This removes duplicate values from the list.