It’s no secret that the world has seen an explosion of information in the recent past, an explosion that experts predict will continue as the billions of people who use online resources continue to expand their usage, and the Internet penetration increases. Further, the new transformed participative Web is allowing users to become co-creators of the content, rather than merely being consumers. Text constitutes the largest part of the Web content. While text documents and the traditional science of Information Retrieval have existed for a long time, the storage of text in electronic form and the resultant ease of dissemination and sharing over the Internet have changed the scenario. We are now witness to large volume of text stored in electronic forms and also new ways of exploiting them for obtaining useful information and inferences. Text analytics in its simplest form may be understood as a set of methods for extracting usable knowledge from unstructured text data. Text analytics describes a set of linguistic, statistical, and machine learning techniques that model, structure and process the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The essence of text analytics is thus to take very large unstructured text documents and extract useful intelligence.
Text documents are structured for reading by people, but they are unstructured as far as data extraction or reading by a machine is concerned. The machine learning algorithms and techniques try to overcome this problem by allowing discovering useful knowledge from enormous collections of documents. Traditional data mining technologies mine knowledge from data structured with well-formed schemes such as relational tables. However, text data does not use such schemes, and the information is described freely within the documents. Therefore, a sophisticated set of algorithms and techniques, which draw building blocks from machine learning, language processing and statistical methods, are required. Machine learning, typically known as a branch of artificial intelligence, is primarily about the construction and study of systems that can learn from data. Text documents being ‘the data’ of today, we need to (a) explore the applicability of machine learning algorithms and techniques; (b) critically evaluate them; and (c) design new algorithmic formulations and systems; for obtaining useful inferences from the textual data. The special session on ‘Machine Learning and Text Analytics’ aims to bring together researchers and professionals working in the area to address this goal, look at the problems and issues, and to contribute to the state of the art.
Full content can be downloaded here.