Overview

Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.

Presentation

Code

An implementation of Focused Clustering and Outlier Detection (FocusCO) is now available on GitHub. It is a very much ‘research’ code, but may perhaps be useful.

Data

We used these attributed graph datasets:

Citing

If you find our work useful in your research, we ask that you cite the following paper:

@inproceedings{Perozzi:2014:FCO:2623330.2623682,
 author = {Perozzi, Bryan and Akoglu, Leman and Iglesias S\'{a}nchez, Patricia and M\"{u}ller, Emmanuel},
 title = {Focused Clustering and Outlier Detection in Large Attributed Graphs},
 booktitle = {Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '14},
 year = {2014},
 isbn = {978-1-4503-2956-9},
 location = {New York, New York, USA},
 pages = {1346--1355},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2623330.2623682},
 doi = {10.1145/2623330.2623682},
 acmid = {2623682},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {attributed graphs, clustering, distance metric learning, focused graph mining, infer user preference, outlier mining},
}