All tutorials will be held on Sunday, August 24, 2008.
Please click on the tutorial titles for more details.
+ J. Han, J. Lee, H. Gonzalez, X. Li, "Mining Massive RFID, Trajectory, and
Traffic Data Sets"
Jiawei Han, Jae-Gil Lee, Hector Gonzalez, Xiaolei Li
Department of Computer Science, University of Illinois at Urbana-Champaign
With the wide availability of satellite, RFID, GPS, sensor, wireless, and video
technologies, moving-object data has been collected in massive scale and is becoming
increasingly rich, complex, and ubiquitous. There is an imminent need for scalable
and flexible data analysis over moving-object information; and thus mining moving-object
data has become one of major challenges in data mining. There have been considerable
research efforts on data mining for RFID, trajectory, and traffic data sets. However,
there has been no systematic tutorial on knowledge discovery from such moving-object
data sets. This tutorial presents a comprehensive, organized, and state-of-the-art
survey on methodologies and algorithms on analyzing different kinds of moving-object
data sets, with an emphasis on several important mining tasks: clustering, classification,
outlier analysis, and multidimensional analysis. Besides a thorough survey of the
recent research work on this topic, we also show how real-world applications can
benefit from data mining of RFID, trajectory, and traffic data sets. The tutorial
consists of three parts: (1) RFID data mining, (2) trajectory data mining, and (3)
traffic data mining. In the first part, warehousing, cleaning, and flow mining for
RFID data are explored. In the second part, pattern mining, clustering, classification,
and outlier detection for trajectory data are explored. In the third part, route
discovery, destination prediction, and hot-route or outlier detection for traffic
data are explored. This tutorial is prepared for data mining, database, and machine
learning researchers who are interested in moving-object data.
+ J. Neville, F. Provost, "Predictive Modeling with Social Networks"
Jennifer Neville, Purdue University
Foster Provost, New York University
Recently there has been a surge of interest in methods for analyzing complex social
networks: from communication networks, to friendship networks, to professional and
organizational networks. The dependencies among linked entities in the networks
present an opportunity to improve inference about properties of individuals, as
birds of a feather do indeed flock together. For example, when deciding how to market
a product to people in MySpace or Facebook, it may be helpful to consider whether
a person's friends are likely to purchase the product.
This tutorial will explore the unique opportunities and challenges for modeling
social network data. We will begin with a description of the problem setting, including
examples of various applications of social network mining (e.g., marketing, fraud
detection). We will then present a number of characteristics of social network data
that differentiate it from traditional inference and learning settings, and outline
the resulting opportunities for significantly improved inference and learning. We
will discuss specific techniques for capitalizing on each of the opportunities in
statistical models, and outline both methodological issues and potential modeling
pathologies that are unique to network data. We will give links to the recent literature
to guide study, and present results demonstrating the effectiveness of the techniques.
Prerequisites: The tutorial assumes a basic knowledge of AI-style
inference and machine learning, equivalent to an introductory graduate or advanced
+ J. Pei, M. Hua, Y. Tao, X. Lin, "Mining Uncertain and Probabilistic Data:
problems, Challenges, Methods, and Applications"
Jian Pei, Simon Fraser University, Canada
Ming Hua, Simon Fraser University, Canada
Yufei Tao, The Chinese University of Hong Kong
Xuemin Lin, The University of New South Wales, Australia
Uncertain data are inherent in some important applications, such as environmental
surveillance, market analysis, and quantitative economics research. Uncertain data
in those applications are generally caused by factors like data randomness and incompleteness,
limitations of measuring equipment, delayed data updates, etc. Due to the importance
of those applications and the rapidly increasing amount of uncertain data collected
and accumulated, analyzing and mining large collections of uncertain data have become
an important task and attracted more and more interest from the data mining community.
In this tutorial, we will give a systematic survey on the motivations/applications,
the problems, the challenges, the fundamental principles and the state-of-the-art
methods of mining uncertain and probabilistic data. We will motivate the survey
with several interesting practical applications of uncertain data analysis. To set
the stage, we will discuss two major models for uncertain and probabilistic data
briefly. We will cover several important data mining tasks on uncertain data, including
clustering, classification, frequent pattern mining and online analytical processing
(OLAP). For each task, we will analyze the challenges posed by uncertain and probabilistic
data and the state-of-the-art solutions.
+ H. Kriegel, P. Kroger, A. Zimek, "Detecting Clusters in Moderate-to-High
Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek
Institute for Informatics, Ludwig-Maximilians-Universitat Munchen, Germany
This tutorial provides a comprehensive and comparative overview of a broad range
of state-of-the-art algorithms for finding clusters in moderate-to-high-dimensional
data. It sketches important applications of the introduced methods, outlines the
general challenges these algorithms have to cope with, and presents a taxonomy of
existing approaches. In addition, relationships between the algorithmic approaches
of each category of the taxonomy are discussed. The intended audience of this tutorial
ranges from novice researchers to advanced experts as well as practitioners from
any application domain dealing with high-dimensional data.
+ H. Liu and N. Agarwal, "Blogosphere: Research Issues, Applications, and
Huan Liu, Arizona State University
Nitin Agarwal, Arizona State University
The objective of this tutorial is to give a comprehensive overview of the techniques,
applications, and research issues in the blogosphere. Weblogs, or Blogs, have facilitated
people to express their thoughts, voice their opinions, and share their experiences
and ideas. Individuals experience a sense of community, a feeling of belonging,
a bonding that members matter to one another and their niche needs will be met through
online interactions. Its open standards and low barrier to publication have transformed
information consumers to producers. This has created a plethora of open-source intelligence,
or "collective wisdom" that acts as the storehouse of overwhelming amounts of knowledge
about the members, their environment and the symbiosis between them. Nonetheless,
vast amounts of this knowledge still remain to be discovered and exploited in its
most suitable way. In this tutorial, we introduce current and state-of-the-art research
issues, review some key elements of research such as tools and methodologies in
Blogosphere, and present a case study of identifying the influential bloggers in
a community to exemplify the integration of some major aspects discussed in this
+ X. Yan & K. Borgwardt, "Graph Mining and Graph Kernels"
Karsten Borgwardt, University of Cambridge
Xifeng Yan, IBM T.J. Watson Research Center New York
Abstract Social and biological networks have led to a huge interest in
data analysis on graphs. Various groups within the KDD community have begun to study
the task of data mining on graphs, including researchers from database-oriented
graph mining, and researchers from kernel machine learning. Their approaches are
often complementary, and we feel that exciting research problems and techniques
can be discovered by exploring the link between these different approaches to graph
This tutorial presents a comprehensive overview of the techniques developed in graph
mining and graph kernels and examines the connection between them. The goal of this
tutorial is i) to introduce newcomers to the field of graph mining, ii) to introduce
people with database background to graph mining using kernel machines, iii) to introduce
people with machine learning background to database-oriented graph mining, and iv)
to present exciting research problems at the interface of both fields.
+ R. Feldman, L. Ungar, "Applied Text Mining"
Ronen Feldman, Hebrew University
Lyle Ungar, University of Pennsylvania
The information age has made it easy to store large amounts of data. The proliferation
of documents available on the Web, on corporate intranets, on news wires, and elsewhere
is overwhelming. However, while the amount of data available to us is constantly
increasing, our ability to absorb and process this information remains constant.
Search engines only exacerbate the problem by making more and more documents available
in a matter of a few key strokes. Text Mining is an exciting research area that
tries to solve the information overload problem by using techniques from data mining,
machine learning, NLP, IR and knowledge management. Text Mining involves the preprocessing
of document collections (text categorization, information extraction, term extraction),
the storage of the intermediate representations, the techniques to analyze these
intermediate representations (distribution analysis, clustering, trend analysis,
association rules etc) and visualization of the results. In this tutorial we will
present the general theory of Text Mining and will demonstrate several systems that
use these principles to enable interactive exploration of large textual collections.
We will present a general architecture for text mining and will outline the algorithms
and data structures behind the systems. Special emphasis will be given to lessons
learned from years of experience in developing real world text mining systems. The
Tutorial will cover the state of the art in this rapidly growing area of research.
Several real world applications of text mining will be presented.
- Proposals due: March 15, 2008
- Notification of Acceptance: May 25, 2008
Call for Tutorial Proposals
KDD'08 will host tutorials covering topics in data mining of interest to the research
community as well as application developers. The tutorials will be part of the main
conference technical program, and are free of charge to the attendees of the conference.
We invite proposals for half-day tutorials from active researchers and experienced
tutors. Ideally, a tutorial will cover the state-of-the-art research, development
and applications in a specific data mining direction, and stimulate and facilitate
future work. Tutorials on interdisciplinary directions, novel and fast growing directions,
and significant applications are highly encouraged.
A tutorial proposal should be formatted in the following sections.
- Abstract (up to 150 words)
- Target audience and prerequisites. Proposals must clearly identify the intended
audience for the tutorial (e.g., novice users of statistical techniques, or expert
researchers in text mining). What background will be required of the audience? Why
is this topic important/interesting to the KDD community? What is the benefit to
- Outline of the tutorial. Enough material should be included to provide a sense of
both the scope of material to be covered and the depth to which it will be covered.
The more details that can be provided, the better (up to and including links to
the actual slides). Note that the tutors should NOT focus mainly on their own research
results. A KDD tutorial is not a forum for promoting one's research or product.
- A list of forums and their time and locations if the tutorial or a similar/highly
related tutorial has been presented by the same author(s) before, and highlight
the similarity/difference between those and the one proposed for KDD'08 (up to 100
words for each entry)
- A list of tutorials on the same/similar/highly related topics given by other people,
and highlight the difference between yours and theirs (up to 100 words for each
- A list of other tutorials given by the authors, please list the titles, the presenters
and the forums only.
- Tutors' short bio and their expertise related to the tutorial (up to 200 words per
- A list of up to 20 most important references that will be covered in the tutorial
- (Optional) URLs of the slides/notes of the previous tutorials given by the authors,
and any specific audio/video/computer requirements for the tutorial.
Please send your submission to firstname.lastname@example.org