Geo-Spatially Enabled Dynamic Network Analysis
Probabilistic Graphical Models
Often we are able to retrieve several types of data on a geospatial system -- for example, we can correlate spatial / temporal information such as AIS data with relational information such as that found in Lloyd's register. Ideally, we would be able to detect patterns that allow us to use geospatial data to infer relationships, and relationships to predict geospatial behavior. However, most current literature approaching this task does not truly integrate the two types of data, instead using one dataset as a trivial preprocessor for the other (e.g. clustering only those ships sharing a common owner). Methods like this are limited in their ability to discover complex patterns.
A promising alternative comes in the form of probabilistic graphical models (PGMs), which can encode dependencies between many kinds of variables in the data. PGMs are a very general representation capable of encoding both relational and spatial data, making them an excellent choice for finding patterns spanning datasets. Working out the structure of graphical models that allow us to efficiently find interesting patterns in large geospatial datasets is an important theoretical advance.
Localized Pattern Detection
Most machine learning algorithms aim to find global patterns that explain a large proportion of the data. However, many social and geospatial patterns are highly localized. For example, the port queuing system at Le Havre is specific to that port, and neither predicts nor is predicted by observing traffic elsewhere. Clustering is an approach that allows us to segment our data and develop models that only apply in specific situations. However, the theoretical work has not been done to integrate clustering approaches with the PGMs discussed above. Bridging the two would allow us to discover an unknown number of highly localized patterns whose signature spans both spatial and relational data.
Anomaly Detection
Pattern detection usually requires observing many instances of the same pattern. In many security scenarios this is impossible: we do not have enough well-documented examples of covert networks, terrorist attacks, or smuggling operations to truly train a statistical model of such behaviors. An attractive alternative is build systems that recognize normal behavior in terms of spatial and temporal movements, and report as anomalies any behavior which differs significantly. Adapting PGMs to this purpose would allow us to move from the pattern discovery listed above to a system generating outputs useful in helping human analysts triage their attention.
The goals of this task are to develop new methods to perform data mining across a wide variety of data sources that have been linked based on network analysis. More specifically, this task proposes to incorporate the spatial/temporal information that CMU is adding to ORA. So this will take us beyond just data mining, but into the realm of spatial data mining. The primary approach that CMU intends to use is probabilistic graphical models (PGMs). PGMs combine characteristics of probability theory and graph theory. This combination allows analysts to calculate/assign probabilities in a network based on applications of Bayesian probabilities. Nodes and links are used to represent joint distributions. If the graphs are directed (directional) all sorts of approaches can be used - hidden Markov, factor analysis, probabilistic principal component analysis, etc. Undirected graphs can also be applied with PGMs. While I do not pretend to understand all of this totally, it seems like it will provide many benefits. First, it will allow us to start getting a handle on uncertainty in the network. How the spatial data will work into this is the research question that they are addressing here. The last two tasks (localized pattern detection and anomaly detection) are some of the applications that we hope to develop once the initial work has been done with the PGMs. This work is at the crossroads of technologies, and so is likely to afford major advances.
At the CASOS Summer Institute, CASOS Ph.D. students have the chance to display and discuss their projects and work. The 2008 CASOS Summer Institute poster for Geo-Spatially Enabled Dynamic Network Analysis are:
"Unsupervised Plan Detection In Maritime GPS Data""Estimated Mission Execution Patterns from Multi-Agent Simulation"
"Extending ORA for Spatial and Temporal Data"