Updates: there are a couple of changes to the session chairs. Research Session 14: Sequence Data, originally chaired by Toon Calders will be chaired by Jennifer Dy. Research Session 13: Collaborative Filtering and Matrices originally chaired by Naren Ramakrishnan will be chaired by Chris Ding.

Combined Session 1: Topic Modeling

Monday 10:30 am - 12:30 pm, Kenitra

Chair: Ken Church

25-minute presentations

  • (I) Learning from Multi-Topic Web Documents for Contextual Advertising. Y. Zhang, A. C. Surendran, J. C. Platt, M. Narasimhan. (Best Application Paper Award Runner-up)
  • (R) Probabilistic Latent Semantic Visualization: Topic Model for Visualizing Documents. T. Iwata, T. Yamada, N. Ueda Fast.
  • (R) Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling.

15-minute presentations

  • (R) Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection. X. Ling, Q. Mei, C. Zhai, B. Schatz.
  • (R) Joint Latent Topic Models for Text and Citations. R. Nallapati, A. Ahmed, E. P. Xing, W. W. Cohen.
  • (R) Topical Query Decomposition. F. Bonchi, C. Castillo, D. Donato, A. Gionis.

Research Session 1: Data Integration

Monday 10:30 am - 12:30 pm, Tangier

Chair: Hui Xiong

25-minute presentations

  • A Unified Approach for Schema Matching, Coreference and Canonicalization. M. Wick, K. Rohanimanesh, K. Schultz, A. McCallum.
  • De-duping URLs via Rewrite Rules. A. Dasgupta, R. Kumar, A. Sasturkar.
  • Unsupervised Deduplication using Cross-Field Dependencies. R. Hall, C. Sutton, A. McCallum.

15-minute presentations

  • Automatic Record Linkage using Seeded Nearest Neighbour and Support Vector Machine Classification. P. Christen.
  • Entity Categorization Over Large Document Collections. V. Ganti, A. C. König, R. Vernica.
  • Identifying Biologically Relevant Genes via Multiple Heterogeneous Data Sources. Z. Zhao, J. Wang, H. Liu, J. Ye, Y. Chang.

Research Session 2: Social Networks

Monday 10:30 am - 12:30 pm, Rabat

Chair: Huan Liu

25-minute presentations

  • The Structure of Information Pathways in a Social Communication Network. G. Kossinets, J. Kleinberg, D. Watts.
  • Influence and Correlation in Social Networks. A. Anagnostopoulos, R. Kumar, M. Mahdian.
  • Weighted Graphs and Disconnected Components. M. McGlohon, L. Akoglu, C. Faloutsos.

15-minute presentations

  • Microscopic Evolution of Social Networks. J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins.
  • Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, J. Leskovec.
  • Feedback Effects between Similarity and Social Influence in Online Communities. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, S. Suri.

Combined Session 2: Text Mining

Monday 10:30 am - 12:30pm, Baraka

Chair: Gregory Piatetsky-Shapiro

25-minute presentations

  • (R) Generating Succinct Titles for Web URLs. D. Chakrabarti, R. Kumar, K. Punera.
  • (R) Efficient Computation of Personal Aggregate Queries on Blogs. K. C. Sia, J. Cho, Y. Chi, B. L. Tseng.
  • (I) Detecting Privacy Leaks Using Corpus-Based Association Rules. R. Chow, P. Golle, J. Staddon.

15-minute presentations

  • (I) Identifying Domain Expertise of Developers from Source Code. R. Sindhgatta.
  • (I) Anticipating Annotations and Emerging Trends in Biomedical Literature. F. Mörchen, M. Dejori, D. Fradkin, J. Etienne, B. Wachmann, M. Bundschus.
  • (I) Customer Targeting Models Using Actively-Selected Web Content. P. Melville, S. Rosset, R. D. Lawrence.

Research Session 3: Statistical Methods

Monday 2:00 pm - 3:35 pm, Kenitra

Chair: Michael Mahoney

25-minute presentations

  • FastANOVA: an efficient algorithm for genome-wide association study. X. Zhang, F. Zou, W. Wang. (Best Research Paper Award Winner)
  • A Bayesian Mixture Model with Linear Regression Mixing Proportions. X. Song, C. Jermaine, S. Ranka, J. Gums.

15-minute presentations

  • Model-Based Document Clustering with a Collapsed Gibbs Sampler. D. D. Walker, E. K. Ringger.
  • Knowledge Discovery of Semantic Relationships between Words Using Nonparametric Bayesian Graph Model. I. Sato, M. Yoshida, H. Nakagawa.
  • Reconstructing Chemical Reaction Networks: Data Mining meets System Identification. Y. J. Cho, N. Ramakrishnan, Y. Cao.

Research Session 4: Graph Mining

Monday 2:00 pm - 3:35 pm, Tangier

Chair: Xifeng Yan

25-minute presentations

  • Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs. L. Becchetti, P. Boldi, C. Castillo, A. Gionis.
  • Can Complex Network Metrics Predict the Behavior of NBA Teams?. P. O. S. Vaz de Melo, V. A. F. Almeida, A. A. F. Loureiro.

15-minute presentations

  • Community Evolution in Dynamic Multi-Mode Networks. L. Tang, H. Liu, J. Zhang, Z. Nazeri.
  • Colibri: Fast Mining of Large Static and Dynamic Graphs. H. Tong, S. Papadimitriou, J. Sun, P. S. Yu, C. Faloutsos.
  • Partial Least Squares Regression for Graph Mining. H. Saigo, N. Krämer, K. Tsuda.

Research Session 5: Classification

Monday 2:00 pm - 3:35 pm, Rabat

Chair: Zijiang Zheng

25-minute presentations

  • Multi-class Cost-sensitive Boosting with p-norm Loss Functions. A. C. Lozano, N. Abe.
  • Training Structural SVMs with Kernels Using Sampled Cuts. C. J. Yu, T. Joachims.

15-minute presentations

  • Learning Subspace Kernels for Classification. J. Chen, S. Ji, B. Ceran, Q. Li, M. Wu, J. Ye.
  • Building Semantic Kernels for Text Classification using Wikipedia. P. Wang, C. Domeniconi.
  • Extracting Shared Subspace for Multi-label Classification. S. Ji, L. Tang, S. Yu, J. Ye.

Industry Session 1: Invited Talk & Exploiting Location Information and Geo-mining

Monday 2:00 pm - 3:35 pm, Baraka

Chair: Gabor Melli

  • Invited Talk - Thore Graepel: Large Scale Data Analysis and Modeling in Online Services and Advertising

15-minute presentations

  • Automated Cyclone Discovery and Tracking using Knowledge Sharing in Multiple Heterogeneous Satellite Data. S.-S. Ho, A. Talukder.
  • Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. N. Koenigstein, Y. Shavitt, T. Tankel.
  • Land Cover Change Detection: A Case Study. S. Boriah, V. Kumar, M. Steinbach, C. Potter, S. Klooster.

Research Session 6: Rank and Metric Learning

Monday 4:00 pm - 5:20 pm, Kenitra

Chair: Jimeng Sun

25-minute presentations

  • Bypass Rates: Reducing Query Abandonment using Negative Inferences. A. D. Sarma, S. Gollapudi, S. Ieong.
  • Structured Metric Learning for High Dimensional Problems. J. V. Davis, I. S. Dhillon.

15-minute presentations

  • Structured Learning for Non-Smooth Ranking Losses. S. Chakrabarti, R. Khanna, U. Sawant, C. Bhattacharyya.
  • Mining Preferences from Superior and Inferior Examples. B. Jiang, J. Pei, X. Lin, D. W. Cheung, J. Han.

Research Session 7: Clustering and Distance Functions

Monday 4:00 pm - 5:20 pm, Tangier

Chair: Martin Ester

25-minute presentations

  • Finding Non-Redundant, Statistically Significant Regions in High Dimensional Data: a Novel Approach to Projected and Subspace Clustering. G. Moise, J. Sander.
  • Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics. K. Eshghi, S. Rajaram.

15-minute presentations

  • SAIL: Summation-based Incremental Learning for Information-Theoretic Clustering. J. Wu, H. Xiong, J. Chen.
  • A Family of Dissimilarity Measures between Nodes Generalizing both the Shortest-Path and the Commute-time Distances. L. Yen, M. Saerens, A. Mantrach, M. Shimbo.

Research Session 8: Streams and Evolving Data

Monday 4:00 pm - 5:20 pm, Rabat

Chair: Haixun Wang

25-minute presentations

  • Volatile Correlation Computation: A Checkpoint View. W. Zhou, H. Xiong. (Best Student Paper Award Runner-Up)
  • Constructing Comprehensive Summaries of Large Event Sequences. J. Kiernan, E. Terzi.

15-minute presentations

  • Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams. A. Bifet, R. Gavaldà.
  • Categorizing and Mining Concept Drifting Data Streams. P. Zhang, X. Zhu, Y. Shi.

Industry Session 2: Social Networks

Monday 4:00 pm - 5:20 pm, Baraka

Chair: Eric E. Bloedorn

25-minute presentations

  • ArnetMiner: Extraction and Mining of Academic Social Networks. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su.
  • Identifying Authoritative Actors in Question-Answering Forums - The Case of Yahoo! Answers. M. Bouguessa, B. Dumoulin, S. Wang.

15-minute presentations

  • Bridging Centrality: Graph Mining from Element Level to Group Level. W. Hwang, T. Kim, M. Ramanathan, A. Zhang.
  • Experimental Comparison of Scalable Online Ad Serving. G. Wu, B. Kitts.

Research Session 9: Active and Semi-supervised Learning

Tuesday 10:30 am - 12:05 pm, Kenitra

Chair: Michael Berthold

25-minute presentations

  • Effective Label Acquisition for Collective Classification. M. Bilgic, L. Getoor. (Best Student Paper Award Winner)
  • Using Ghost Edges for Classification in Sparsely Labeled Networks. B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos.

15-minute presentations

  • Semi-Supervised Approach to Rapid and Reliable Labeling of Large Data Sets. G. J. Simon, V. Kumar, Z. Zhang.
  • Knowledge Transfer via Multiple Model Local Structure Mapping. J. Gao, W. Fan, J. Jiang, J. Han.
  • Active Learning with Direct Query Construction. C. X. Ling, J. Du.

Research Session 10: Discovery and Detection

Tuesday 10:30 am - 12:05 pm, Tangier

Chair: Dejing Dou

25-minute presentations

  • Automatic Identification of Quasi-Experimental Designs for Discovering Causal Knowledge. D. D. Jensen, A. S. Fast, B. J. Taylor, M. E. Maier.
  • Discrimination-aware Data Mining. D. Pedreschi, S. Ruggieri, F. Turini.

15-minute presentations

  • Local Peculiarity Factor and Its Application in Outlier Detection. J. Yang, N. Zhong, Y. Yao, J. Wang.
  • Angle-Based Outlier Detection in High-dimensional Data. H. Kriegel, M. Schubert, A. Zimek.
  • Anomaly Pattern Detection in Categorical Datasets. K. Das, J. Schneider, D. B. Neill.

Research Session 11: Pattern Mining

Tuesday 10:30 am - 12:05 pm, Rabat

Chair: Bart Goethals

25-minute presentations

  • Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. Yu, O. Verscheure.
  • Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms. R. Gupta, G. Fang, B. Field, M. Steinbach, V. Kumar.

15-minute presentations

  • Effective and Efficient Itemset Pattern Summarization: Regression-based Approaches. R. Jin, M. Abu-Ata, Y. Xiang, N. Ruan.
  • Constraint Programming for Itemset Mining. L. De Raedt, T. Guns, S. Nijssen.
  • Succinct Summarization of Transactional Databases: An Overlapped Hyperrectangle Scheme. Y. Xiang, R. Jin, D. Fuhry, F. F. Dragan.

Industry Session 3: Invited Talk & Visual Analytics

Tuesday 10:30 am - 12:05 pm, Baraka

Chair: Volker Tresp

  • Invited Talk - Udo Miletzki: The Genesis of Postal OCR and Beyond

25-minute presentation

  • A Visual-Analytic Toolkit for Dynamic Interaction Graphs. X. Yang, S. Asur, S. Parthasarathy, S. Mehta.

15-minute presentation

  • The Persuasive Phase of Visualization. C. H. Chih, D. S. Parker.

Research Session 12: Feature Selection

Tuesday 2:00 pm - 3:20 pm, Tangier

Chair: Nina Mishra

25-minute presentations

  • On Updates that Constrain the Features' Connections During Learning. O. Madani, J. Huang.
  • Stable Feature Selection via Dense Feature Groups. L. Yu, C. Ding, S. Loscalzo.

15-minute presentations

  • Unsupervised Feature Selection for Principal Components Analysis. C. Boutsidis, M. W. Mahoney, P. Drineas.
  • FAST: A ROC-based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. X. Chen, M. Wasikowski.

Research Session 13: Collaborative Filtering and Matrices

Tuesday 2:00 pm - 3:20 pm, Rabat

Chair: Chris Ding

25-minute presentations

  • Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Y. Koren.
  • Combinational Collaborative Filtering for Personalized Community Recommendation. W. Chen, D. Zhang, E. Y. Chang.

15-minute presentations

  • Banded Structure in Binary Matrices. G. C. Garriga, E. Junttila, H. Mannila.
  • Spectral Domain-Transfer Learning. X. Ling, W. Dai, G. Xue, Q. Yang, Y. Yu.

Research Session 14: Sequence Data

Tuesday 2:00 pm - 3:20 pm, Baraka

Chair: Jennifer Dy

25-minute presentations

  • SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models. Y. Fujiwara, Y. Sakurai, M. Yamamuro. (Best Paper Award Runner-up)
  • Efficient Ticket Routing by Resolution Sequence Mining. Q. Shao, Y. Chen, S. Tao, X. Yan, N. Anerousis.

15-minute presentations

  • iSAX: Indexing and Mining Terabyte Sized Time Series. J. Shieh, E. Keogh.
  • Permu-pattern: Discovery of Mutable Permutation Patterns with Proximity Constraint. Meng Hu, Jiong Yang, Wei Su.

Research Session 15: SIGKDD Dissertation Award Winners & Privacy

Tuesday 3:50 pm - 5:10 pm, Kenitra

Chair: Jian Pei

12-minute presentations

  • Scalable Mining and Link Analysis Across Multiple Database Relations. X. Yin.
  • Incremental Pattern Discovery on Streams, Graphs and Tensors. J. Sun.

25-minute presentation

  • The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing. J. Brickell, V. Shmatikov.

15-minute presentations

  • Composition Attacks and Auxiliary Information in Data Privacy. S. R. Ganta, S. P. Kasiviswanathan, A. Smith.
  • Anonymizing Transaction Databases for Publication. Y. Xu, K. Wang, A. W. Fu, P. S. Yu.

Research Session 16: Prediction Models

Tuesday 3:50 pm - 5:10 pm, Tangier

Chair: Tanya Berger-Wolf

25-minute presentations

  • Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. V. S. Sheng, F. Provost, P. G. Ipeirotis. (Best Paper Award Runner-Up)
  • Stream Prediction Using A Generative Model Based On Frequent Episodes In Event Sequences. S. Laxman, V. Tankasali, R. W. White.

15-minute presentations

  • Partitioned Logistic Regression for Spam Filtering. M. Chang, W. Yih, C. Meek.
  • Asymmetric Support Vector Machines: Low False-Positive Learning Under the User Tolerance. S. Wu, K. Lin, C. Chen, M. Chen.

Combined Session 3: Performance and Scale

Tuesday 3:50 pm - 5:10 pm, Rabat

Chair: Karl Rexer

25-minute presentations

  • (R) Scaling Up Text Classification for Large File Systems. G. Forman, S. Rajaram.
  • (I) Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. R. Grossman, Y. Gu Short.

15-minute presentations

  • (R) A Sequential Dual Method for Large Scale Multi-Class Linear SVMs. S. S. Keerthi, S. Sundararajan, K. Chang, C. Hsieh, C. Lin.
  • (R) Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. L. Li, W. Fu, F. Guo, T. C. Mowry, C. Faloutsos.

Industry Session 4: Medical Data Mining

Tuesday 3:50 pm - 5:10 pm, Baraka

Chair: Ranga Raju Vatsavai

25-minute presentations

  • Heterogeneous Data Fusion for Alzheimer's Disease Study. J. Ye, K. Chen, T. Wu, J. Li, Z. Zhao, R. Patel, M. Bae, R. Janardan, H. Liu, G. Alexander, E. Reiman.
  • Temporal Pattern Discovery for Trends and Transient Effects: Its Application to Patient Records. G. N. Norén, A. Bate, J. Hopstadius, K. Star, I. R. Edwards.

15-minute presentations

  • Learning Methods for Lung Tumor Markerless Gating in Image-Guided Radiotherapy. Y. Cui, J. G. Dy, G. C. Sharp, B. M. Alexander, S. B. Jiang.
  • Privacy-Preserving Cox Regression for Survival Analysis. S. Yu, G. Fung, R. Rosales, S. Krishnan, R. B. Rao, C. Dehing-Oberije, P. Lambin.

Combined Session 4: Text Mining

Wednesday 10:30 am - 12:10 pm, Kenitra

Chair: Xiaowei Xu

25-minute presentations

  • (R) Fast Logistic Regression for Text Categorization with Variable-Length N-grams. G. Ifrim, G. Bakir, G. Weikum.
  • (R) Structured Entity Identification and Document Categorization: Two Tasks with One Joint Model. I. Bhattacharya, S. Godbole, S. Joshi.
  • (I) Text Classification, Business Intelligence, and Interactivity: Automating C-Sat Analysis for Services Industry. S. Godbole, S. Roy.
  • (R) Information Extraction from Wikipedia: Moving Down the Long Tail. F. Wu, R. Hoffmann, D. S. Weld.

Research Session 17: Partially Supervised Learning

Wednesday 10:30 am - 12:10 pm, Tangier

Chair: Dragos Margineantu

25-minute presentations

  • Learning Classifiers from Only Positive and Unlabeled Data. C. Elkan, K. Noto.
  • CutS3VM: A Fast Semi-Supervised SVM Algorithm. B. Zhao, F. Wang, C. Zhang.
  • Semi-supervised Learning with Data Calibration for Long-Term Time Series Forecasting. H. Cheng, P. Tan.
  • Classification with Partial Labels. N. Nguyen, R. Caruana.

Research Session 18: Matrix Methods

Wednesday 10:30 am - 12:10 pm, Rabat

Chair: Yan Liu

25-minute presentations

  • Interpretable Nonnegative Matrix Decompositions. S. Hyvönen, P. Miettinen, E. Terzi.
  • Relational Learning via Collective Matrix Factorization. A. P. Singh, G. J. Gordon.
  • Hypergraph Spectral Learning for Multi-label Classification. L. Sun, S. Ji, J. Ye.
  • Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and K-Means Clustering. H. Huang, C. Ding, D. Luo, T. Li.

Industry Session 5: Search and Commerce

Wednesday 10:30 am - 12:10 pm, Baraka

Chair: Ronny Kohavi

25-minute presentations

  • Context-Aware Query Suggestion by Mining Click-Through and Session Data. H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li. (Best Application Paper Award Winner)
  • Scalable and Near Real-Time Burst Detection from eCommerce Queries. N. Parikh, N. Sundaresan.
  • Using Predictive Analysis to Improve Invoice-to-Cash Collection. S. Zeng, P. Melville, C. A. Lang, I. Boier-Martin, C. Murphy.
  • TagMark: Reliable Estimations of RFID Tags for Business Processes. L. W. F. Chaves, E. Buchmann, K. Böhm.