|
Updates: there are a couple of changes to the session chairs. Research Session 14: Sequence Data,
originally chaired by Toon Calders will be chaired by Jennifer Dy. Research Session 13:
Collaborative Filtering and Matrices originally chaired by Naren Ramakrishnan will be chaired by
Chris Ding.
Combined Session 1: Topic Modeling
Monday 10:30 am - 12:30 pm, Kenitra
Chair: Ken Church
25-minute presentations
- (I) Learning from Multi-Topic Web Documents for Contextual Advertising.
Y. Zhang, A. C. Surendran, J. C. Platt, M. Narasimhan. (Best Application Paper
Award Runner-up)
- (R) Probabilistic Latent Semantic Visualization: Topic Model for Visualizing
Documents. T. Iwata, T. Yamada, N. Ueda Fast.
- (R) Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. I. Porteous,
D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling.
15-minute presentations
- (R) Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection.
X. Ling, Q. Mei, C. Zhai, B. Schatz.
- (R) Joint Latent Topic Models for Text and Citations. R. Nallapati, A.
Ahmed, E. P. Xing, W. W. Cohen.
- (R) Topical Query Decomposition. F. Bonchi, C. Castillo, D. Donato, A.
Gionis.
Research Session 1: Data Integration
Monday 10:30 am - 12:30 pm, Tangier
Chair: Hui Xiong
25-minute presentations
- A Unified Approach for Schema Matching, Coreference and Canonicalization.
M. Wick, K. Rohanimanesh, K. Schultz, A. McCallum.
- De-duping URLs via Rewrite Rules. A. Dasgupta, R. Kumar, A. Sasturkar.
- Unsupervised Deduplication using Cross-Field Dependencies. R. Hall, C.
Sutton, A. McCallum.
15-minute presentations
- Automatic Record Linkage using Seeded Nearest Neighbour and Support Vector Machine
Classification. P. Christen.
- Entity Categorization Over Large Document Collections. V. Ganti, A. C.
König, R. Vernica.
- Identifying Biologically Relevant Genes via Multiple Heterogeneous Data Sources.
Z. Zhao, J. Wang, H. Liu, J. Ye, Y. Chang.
Research Session 2: Social Networks
Monday 10:30 am - 12:30 pm, Rabat
Chair: Huan Liu
25-minute presentations
- The Structure of Information Pathways in a Social Communication Network.
G. Kossinets, J. Kleinberg, D. Watts.
- Influence and Correlation in Social Networks. A. Anagnostopoulos, R. Kumar,
M. Mahdian.
- Weighted Graphs and Disconnected Components. M. McGlohon, L. Akoglu, C.
Faloutsos.
15-minute presentations
- Microscopic Evolution of Social Networks. J. Leskovec, L. Backstrom, R.
Kumar, A. Tomkins.
- Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. M. Seshadri,
S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, J. Leskovec.
- Feedback Effects between Similarity and Social Influence in Online Communities.
D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, S. Suri.
Combined Session 2: Text Mining
Monday 10:30 am - 12:30pm, Baraka
Chair: Gregory Piatetsky-Shapiro
25-minute presentations
- (R) Generating Succinct Titles for Web URLs. D. Chakrabarti, R. Kumar,
K. Punera.
- (R) Efficient Computation of Personal Aggregate Queries on Blogs. K. C.
Sia, J. Cho, Y. Chi, B. L. Tseng.
- (I) Detecting Privacy Leaks Using Corpus-Based Association Rules. R. Chow,
P. Golle, J. Staddon.
15-minute presentations
- (I) Identifying Domain Expertise of Developers from Source Code. R. Sindhgatta.
- (I) Anticipating Annotations and Emerging Trends in Biomedical Literature.
F. Mörchen, M. Dejori, D. Fradkin, J. Etienne, B. Wachmann, M. Bundschus.
- (I) Customer Targeting Models Using Actively-Selected Web Content. P. Melville,
S. Rosset, R. D. Lawrence.
Research Session 3: Statistical Methods
Monday 2:00 pm - 3:35 pm, Kenitra
Chair: Michael Mahoney
25-minute presentations
- FastANOVA: an efficient algorithm for genome-wide association study. X.
Zhang, F. Zou, W. Wang. (Best Research Paper Award Winner)
- A Bayesian Mixture Model with Linear Regression Mixing Proportions. X.
Song, C. Jermaine, S. Ranka, J. Gums.
15-minute presentations
- Model-Based Document Clustering with a Collapsed Gibbs Sampler. D. D. Walker,
E. K. Ringger.
- Knowledge Discovery of Semantic Relationships between Words Using Nonparametric
Bayesian Graph Model. I. Sato, M. Yoshida, H. Nakagawa.
- Reconstructing Chemical Reaction Networks: Data Mining meets System Identification.
Y. J. Cho, N. Ramakrishnan, Y. Cao.
Research Session 4: Graph Mining
Monday 2:00 pm - 3:35 pm, Tangier
Chair: Xifeng Yan
25-minute presentations
- Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs.
L. Becchetti, P. Boldi, C. Castillo, A. Gionis.
- Can Complex Network Metrics Predict the Behavior of NBA Teams?. P. O. S.
Vaz de Melo, V. A. F. Almeida, A. A. F. Loureiro.
15-minute presentations
- Community Evolution in Dynamic Multi-Mode Networks. L. Tang, H. Liu, J.
Zhang, Z. Nazeri.
- Colibri: Fast Mining of Large Static and Dynamic Graphs. H. Tong, S. Papadimitriou,
J. Sun, P. S. Yu, C. Faloutsos.
- Partial Least Squares Regression for Graph Mining. H. Saigo, N. Krämer,
K. Tsuda.
Research Session 5: Classification
Monday 2:00 pm - 3:35 pm, Rabat
Chair: Zijiang Zheng
25-minute presentations
- Multi-class Cost-sensitive Boosting with p-norm Loss Functions. A. C. Lozano,
N. Abe.
- Training Structural SVMs with Kernels Using Sampled Cuts. C. J. Yu, T.
Joachims.
15-minute presentations
- Learning Subspace Kernels for Classification. J. Chen, S. Ji, B. Ceran,
Q. Li, M. Wu, J. Ye.
- Building Semantic Kernels for Text Classification using Wikipedia. P. Wang,
C. Domeniconi.
- Extracting Shared Subspace for Multi-label Classification. S. Ji, L. Tang,
S. Yu, J. Ye.
Industry Session 1: Invited Talk & Exploiting Location Information and Geo-mining
Monday 2:00 pm - 3:35 pm, Baraka
Chair: Gabor Melli
- Invited Talk - Thore Graepel: Large Scale Data Analysis and Modeling in Online Services
and Advertising
15-minute presentations
- Automated Cyclone Discovery and Tracking using Knowledge Sharing in Multiple
Heterogeneous Satellite Data. S.-S. Ho, A. Talukder.
- Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings.
N. Koenigstein, Y. Shavitt, T. Tankel.
- Land Cover Change Detection: A Case Study. S. Boriah, V. Kumar, M. Steinbach,
C. Potter, S. Klooster.
Research Session 6: Rank and Metric Learning
Monday 4:00 pm - 5:20 pm, Kenitra
Chair: Jimeng Sun
25-minute presentations
- Bypass Rates: Reducing Query Abandonment using Negative Inferences. A.
D. Sarma, S. Gollapudi, S. Ieong.
- Structured Metric Learning for High Dimensional Problems. J. V. Davis,
I. S. Dhillon.
15-minute presentations
- Structured Learning for Non-Smooth Ranking Losses. S. Chakrabarti, R. Khanna,
U. Sawant, C. Bhattacharyya.
- Mining Preferences from Superior and Inferior Examples. B. Jiang, J. Pei,
X. Lin, D. W. Cheung, J. Han.
Research Session 7: Clustering and Distance Functions
Monday 4:00 pm - 5:20 pm, Tangier
Chair: Martin Ester
25-minute presentations
- Finding Non-Redundant, Statistically Significant Regions in High Dimensional
Data: a Novel Approach to Projected and Subspace Clustering. G. Moise, J. Sander.
- Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics.
K. Eshghi, S. Rajaram.
15-minute presentations
- SAIL: Summation-based Incremental Learning for Information-Theoretic Clustering.
J. Wu, H. Xiong, J. Chen.
- A Family of Dissimilarity Measures between Nodes Generalizing both the Shortest-Path
and the Commute-time Distances. L. Yen, M. Saerens, A. Mantrach, M. Shimbo.
Research Session 8: Streams and Evolving Data
Monday 4:00 pm - 5:20 pm, Rabat
Chair: Haixun Wang
25-minute presentations
- Volatile Correlation Computation: A Checkpoint View. W. Zhou, H. Xiong.
(Best Student Paper Award Runner-Up)
- Constructing Comprehensive Summaries of Large Event Sequences. J. Kiernan,
E. Terzi.
15-minute presentations
- Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams.
A. Bifet, R. Gavaldà.
- Categorizing and Mining Concept Drifting Data Streams. P. Zhang, X. Zhu,
Y. Shi.
Industry Session 2: Social Networks
Monday 4:00 pm - 5:20 pm, Baraka
Chair: Eric E. Bloedorn
25-minute presentations
- ArnetMiner: Extraction and Mining of Academic Social Networks. J. Tang,
J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su.
- Identifying Authoritative Actors in Question-Answering Forums - The Case of
Yahoo! Answers. M. Bouguessa, B. Dumoulin, S. Wang.
15-minute presentations
- Bridging Centrality: Graph Mining from Element Level to Group Level. W.
Hwang, T. Kim, M. Ramanathan, A. Zhang.
- Experimental Comparison of Scalable Online Ad Serving. G. Wu, B. Kitts.
Research Session 9: Active and Semi-supervised Learning
Tuesday 10:30 am - 12:05 pm, Kenitra
Chair: Michael Berthold
25-minute presentations
- Effective Label Acquisition for Collective Classification. M. Bilgic, L.
Getoor. (Best Student Paper Award Winner)
- Using Ghost Edges for Classification in Sparsely Labeled Networks. B. Gallagher,
H. Tong, T. Eliassi-Rad, C. Faloutsos.
15-minute presentations
- Semi-Supervised Approach to Rapid and Reliable Labeling of Large Data Sets.
G. J. Simon, V. Kumar, Z. Zhang.
- Knowledge Transfer via Multiple Model Local Structure Mapping. J. Gao,
W. Fan, J. Jiang, J. Han.
- Active Learning with Direct Query Construction. C. X. Ling, J. Du.
Research Session 10: Discovery and Detection
Tuesday 10:30 am - 12:05 pm, Tangier
Chair: Dejing Dou
25-minute presentations
- Automatic Identification of Quasi-Experimental Designs for Discovering Causal
Knowledge. D. D. Jensen, A. S. Fast, B. J. Taylor, M. E. Maier.
- Discrimination-aware Data Mining. D. Pedreschi, S. Ruggieri, F. Turini.
15-minute presentations
- Local Peculiarity Factor and Its Application in Outlier Detection. J. Yang,
N. Zhong, Y. Yao, J. Wang.
- Angle-Based Outlier Detection in High-dimensional Data. H. Kriegel, M.
Schubert, A. Zimek.
- Anomaly Pattern Detection in Categorical Datasets. K. Das, J. Schneider,
D. B. Neill.
Research Session 11: Pattern Mining
Tuesday 10:30 am - 12:05 pm, Rabat
Chair: Bart Goethals
25-minute presentations
- Direct Mining of Discriminative and Essential Frequent Patterns via Model-based
Search Tree. W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. Yu, O.
Verscheure.
- Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms.
R. Gupta, G. Fang, B. Field, M. Steinbach, V. Kumar.
15-minute presentations
- Effective and Efficient Itemset Pattern Summarization: Regression-based Approaches.
R. Jin, M. Abu-Ata, Y. Xiang, N. Ruan.
- Constraint Programming for Itemset Mining. L. De Raedt, T. Guns, S. Nijssen.
- Succinct Summarization of Transactional Databases: An Overlapped Hyperrectangle
Scheme. Y. Xiang, R. Jin, D. Fuhry, F. F. Dragan.
Industry Session 3: Invited Talk & Visual Analytics
Tuesday 10:30 am - 12:05 pm, Baraka
Chair: Volker Tresp
- Invited Talk - Udo Miletzki: The Genesis of Postal OCR and Beyond
25-minute presentation
- A Visual-Analytic Toolkit for Dynamic Interaction Graphs. X. Yang, S. Asur,
S. Parthasarathy, S. Mehta.
15-minute presentation
- The Persuasive Phase of Visualization. C. H. Chih, D. S. Parker.
Research Session 12: Feature Selection
Tuesday 2:00 pm - 3:20 pm, Tangier
Chair: Nina Mishra
25-minute presentations
- On Updates that Constrain the Features' Connections During Learning. O.
Madani, J. Huang.
- Stable Feature Selection via Dense Feature Groups. L. Yu, C. Ding, S. Loscalzo.
15-minute presentations
- Unsupervised Feature Selection for Principal Components Analysis. C. Boutsidis,
M. W. Mahoney, P. Drineas.
- FAST: A ROC-based Feature Selection Metric for Small Samples and Imbalanced
Data Classification Problems. X. Chen, M. Wasikowski.
Research Session 13: Collaborative Filtering and Matrices
Tuesday 2:00 pm - 3:20 pm, Rabat
Chair: Chris Ding
25-minute presentations
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering
Model. Y. Koren.
- Combinational Collaborative Filtering for Personalized Community Recommendation.
W. Chen, D. Zhang, E. Y. Chang.
15-minute presentations
- Banded Structure in Binary Matrices. G. C. Garriga, E. Junttila, H. Mannila.
- Spectral Domain-Transfer Learning. X. Ling, W. Dai, G. Xue, Q. Yang, Y.
Yu.
Research Session 14: Sequence Data
Tuesday 2:00 pm - 3:20 pm, Baraka
Chair: Jennifer Dy
25-minute presentations
- SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models.
Y. Fujiwara, Y. Sakurai, M. Yamamuro. (Best Paper Award Runner-up)
- Efficient Ticket Routing by Resolution Sequence Mining. Q. Shao, Y. Chen,
S. Tao, X. Yan, N. Anerousis.
15-minute presentations
- iSAX: Indexing and Mining Terabyte Sized Time Series. J. Shieh, E. Keogh.
- Permu-pattern: Discovery of Mutable Permutation Patterns with Proximity Constraint.
Meng Hu, Jiong Yang, Wei Su.
Research Session 15: SIGKDD Dissertation Award Winners & Privacy
Tuesday 3:50 pm - 5:10 pm, Kenitra
Chair: Jian Pei
12-minute presentations
- Scalable Mining and Link Analysis Across Multiple Database Relations. X.
Yin.
- Incremental Pattern Discovery on Streams, Graphs and Tensors. J. Sun.
25-minute presentation
- The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing.
J. Brickell, V. Shmatikov.
15-minute presentations
- Composition Attacks and Auxiliary Information in Data Privacy. S. R. Ganta,
S. P. Kasiviswanathan, A. Smith.
- Anonymizing Transaction Databases for Publication. Y. Xu, K. Wang, A. W.
Fu, P. S. Yu.
Research Session 16: Prediction Models
Tuesday 3:50 pm - 5:10 pm, Tangier
Chair: Tanya Berger-Wolf
25-minute presentations
- Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy
Labelers. V. S. Sheng, F. Provost, P. G. Ipeirotis. (Best Paper Award Runner-Up)
- Stream Prediction Using A Generative Model Based On Frequent Episodes In Event
Sequences. S. Laxman, V. Tankasali, R. W. White.
15-minute presentations
- Partitioned Logistic Regression for Spam Filtering. M. Chang, W. Yih, C.
Meek.
- Asymmetric Support Vector Machines: Low False-Positive Learning Under the User
Tolerance. S. Wu, K. Lin, C. Chen, M. Chen.
Combined Session 3: Performance and Scale
Tuesday 3:50 pm - 5:10 pm, Rabat
Chair: Karl Rexer
25-minute presentations
- (R) Scaling Up Text Classification for Large File Systems. G. Forman, S.
Rajaram.
- (I) Data Mining Using High Performance Data Clouds: Experimental Studies Using
Sector and Sphere. R. Grossman, Y. Gu Short.
15-minute presentations
- (R) A Sequential Dual Method for Large Scale Multi-Class Linear SVMs. S.
S. Keerthi, S. Sundararajan, K. Chang, C. Hsieh, C. Lin.
- (R) Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems
on SMPs. L. Li, W. Fu, F. Guo, T. C. Mowry, C. Faloutsos.
Industry Session 4: Medical Data Mining
Tuesday 3:50 pm - 5:10 pm, Baraka
Chair: Ranga Raju Vatsavai
25-minute presentations
- Heterogeneous Data Fusion for Alzheimer's Disease Study. J. Ye, K. Chen,
T. Wu, J. Li, Z. Zhao, R. Patel, M. Bae, R. Janardan, H. Liu, G. Alexander, E. Reiman.
- Temporal Pattern Discovery for Trends and Transient Effects: Its Application
to Patient Records. G. N. Norén, A. Bate, J. Hopstadius, K. Star, I. R. Edwards.
15-minute presentations
- Learning Methods for Lung Tumor Markerless Gating in Image-Guided Radiotherapy.
Y. Cui, J. G. Dy, G. C. Sharp, B. M. Alexander, S. B. Jiang.
- Privacy-Preserving Cox Regression for Survival Analysis. S. Yu, G. Fung,
R. Rosales, S. Krishnan, R. B. Rao, C. Dehing-Oberije, P. Lambin.
Combined Session 4: Text Mining
Wednesday 10:30 am - 12:10 pm, Kenitra
Chair: Xiaowei Xu
25-minute presentations
- (R) Fast Logistic Regression for Text Categorization with Variable-Length N-grams.
G. Ifrim, G. Bakir, G. Weikum.
- (R) Structured Entity Identification and Document Categorization: Two Tasks
with One Joint Model. I. Bhattacharya, S. Godbole, S. Joshi.
- (I) Text Classification, Business Intelligence, and Interactivity: Automating
C-Sat Analysis for Services Industry. S. Godbole, S. Roy.
- (R) Information Extraction from Wikipedia: Moving Down the Long Tail. F.
Wu, R. Hoffmann, D. S. Weld.
Research Session 17: Partially Supervised Learning
Wednesday 10:30 am - 12:10 pm, Tangier
Chair: Dragos Margineantu
25-minute presentations
- Learning Classifiers from Only Positive and Unlabeled Data. C. Elkan, K.
Noto.
- CutS3VM: A Fast Semi-Supervised SVM Algorithm. B. Zhao, F. Wang, C. Zhang.
- Semi-supervised Learning with Data Calibration for Long-Term Time Series Forecasting.
H. Cheng, P. Tan.
- Classification with Partial Labels. N. Nguyen, R. Caruana.
Research Session 18: Matrix Methods
Wednesday 10:30 am - 12:10 pm, Rabat
Chair: Yan Liu
25-minute presentations
- Interpretable Nonnegative Matrix Decompositions. S. Hyvönen, P. Miettinen,
E. Terzi.
- Relational Learning via Collective Matrix Factorization. A. P. Singh, G.
J. Gordon.
- Hypergraph Spectral Learning for Multi-label Classification. L. Sun, S.
Ji, J. Ye.
- Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High
Order SVD and K-Means Clustering. H. Huang, C. Ding, D. Luo, T. Li.
Industry Session 5: Search and Commerce
Wednesday 10:30 am - 12:10 pm, Baraka
Chair: Ronny Kohavi
25-minute presentations
- Context-Aware Query Suggestion by Mining Click-Through and Session Data.
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li. (Best Application
Paper Award Winner)
- Scalable and Near Real-Time Burst Detection from eCommerce Queries. N.
Parikh, N. Sundaresan.
- Using Predictive Analysis to Improve Invoice-to-Cash Collection. S. Zeng,
P. Melville, C. A. Lang, I. Boier-Martin, C. Murphy.
- TagMark: Reliable Estimations of RFID Tags for Business Processes. L. W.
F. Chaves, E. Buchmann, K. Böhm.
|