Data mining from text is known by the names of “text mining”, “text data mining” and KDT (knowledge discovery in text.). It is the process of identifying useful patterns from unstructured data. Thus it is an attempt to convert totally unstructured data into a high level of information and knowledge. Besides conventional data mining techniques other useful techniques used for data mining from the text include,
- Natural language processing
- Machine learning
- Knowledge management
- Information extraction
Data mining from text is done from different sources of documents which may be either structured or semi structured. Semi structured texts refer to those documents that use markup languages like SGML and XML. Patterns identified from texts can be either predictive or informative. When compared to conventional data mining, text mining differs slightly in that while data mining is concerned with structured data usually numerical, text mining works with information from unstructured data. However getting information is the final aim of both processes. Text mining processes include text categorization, clustering, extraction of some concept, analysis entity relation modeling and document summarization.
Text data mining is useful in different areas and includes many types of commercial, software, academic and security applications. In the field of security, data text mining is used to interpret data and signals and gather intelligence. The ECHELON surveillance project involves intelligence agencies from different countries including the USA and the UK and uses this technology. A wide range of commercial applications and text mining software are also available besides open source software applications.
Managed Outsource Solutions (MOS) is a US based data entry and data processing company that offers highly professional data mining, KDD, data cleansing, web extraction and data conversion services to clients in the US, the UK, Canada and Australia.