Extract, Analyze and Represent Relational Data from Texts
AutoMap is a text mining tool developed by CASOS at Carnegie Mellon.
Input: one or more unstructured texts.
Output: DyNetML files and CS files.
AutoMap is designed to work seamlessly with ORA.
AutoMap enables the extraction of information from texts using Network Text Analysis methods. AutoMap supports the extraction of several types of data from unstructured documents. The type of information that can be extracted includes: content analytic data (words and frequencies), semantic network data (the network of concepts), meta-network data (the cross classification of concepts into their ontological category such as people, places and things and the connections among these classified concepts), and sentiment data (attitudes, beliefs). Extraction of each type of data assumes the previously listed type of data has been extracted.
AutoMap exists as part of a text mining suite that includes a series of pre-processors for cleaning the raw texts so that they can be processed and a set of post-processor that employ semantic inferencing to improve the coding and deduce missing information. These pre-processors include such sub-tools as a pdf to txt converter, non-printing character removal, and limited types of deduplication. Text pre-processing condenses data into concepts, which capture the features of the texts relevant to the user. Statement formation rules determine how to link extracted concepts into networks. The postprocessors include such procedures that link to gazetteers and augment the coding with latitude and longitude, belief inference procedures, and secondary data cleaning tools. In addition there are a series of support tools for creating, maintaining, and editing delete lists, generalization thesauri, and meta-network thesauri.
AutoMap uses parts of speech tagging and proximity analysis to do computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.
AutoMap subsumes classical Content Analysis by analyzing the existence, frequencies, and covariance of terms and themes.
AutoMap has been implemented in Java 1.7.
It can operate in both a front end with gui, and backend mode.
Main functionalities of AutoMap are:
- Extract, analyze and compare mental models of individuals and groups.
- Reveal structure of social and organizational systems from texts.
AutoMap also offers a variety of techniques for pre-processing Natural Language:
- Named-Entity Recognition
- Stemming (Porter, KStem)
- Collocation (Bigram) Detection
- Extraction routines for dates, events, parts of speech
- Thesaurus development and application
- Flexible ontology usage
- Parts of Speech Tagging
The employed algorithm for map analysis is based on Carley's approach to coding texts as cognitive maps and Danowski's approach for proximity analysis.
Automap is also a part of the CASOS Summer Institute. At the CASOS Summer Institute, CASOS Ph.D. students have the chance to display and discuss their projects and work.
The 2010 CASOS Summer Institute posters for Automap are:
- "AutoMap: Extracting usable information from unstructured texts"
- "Relation Extraction from Texts and Computational Integration of Words and Networks"
The 2008 CASOS Summer Institute poster for Automap is:
ORA Google Group
The ORA Google Group provides a forum for questions, collaborations, and information related to CASOS tools. Please visit this link for instructions on becoming a member of the ORA Google Group: How to Join the ORA Google Group.