|
General Schedule
Saturday
|
|
5:00 pm - 9:00 pm
|
Registration (Medinas A Foyer)
|
Sunday
|
|
7:30 am - 8:00 pm
|
Registration (Medinas A Foyer)
|
9:00 am - 5:00 pm
|
Full Day Workshop W1 - ADKDD'08 (Casablanca B)
Full Day Workshop W2 - WEBKDD'08 (Casablanca A)
Full Day Workshop W3 - Sensor-KDD (Rabat B)
Full Day Workshop W4 - PinKDD'08 (Rabat A)
Full Day Workshop W5 - SNA-KDD (Tangier)
Full Day Workshop W13 - Multimedia Data Mining (Kenitra B)
|
9:00 am - 12:00 pm
|
Half Day Workshop W6 - KDD CUP and Mining Medical data (Kenitra A)
Half Day Workshop W7 - Multiple Information Sources (Casablanca F)
Half Day Workshop W11 - BIOKDD08 (Agadir)
Half Day Workshop W12 - Mining for Business Applications (Fez)
|
9:00 am - 12:00 pm
|
Tutorial - Mining Massive RFID, Trajectory, and Traffic Data Sets
(Casablanca H)
Tutorial - Predictive Modeling with Social Networks (Baraka)
Tutorial - Mining Uncertain and Probabilistic Data: Problems, Challenges, Methods, and Applications (Casablanca C)
Tutorial - Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering
(Casablanca G)
|
10:00 am - 10:30 am
|
Coffee Break (Medinas A Foyer, Medinas D)
|
12:00 pm - 2:00 pm
|
Lunch (on your own)
|
2:00 pm - 5:30 pm
|
Half Day Workshop W8 - Large Scale Recommender Systems and NetFlix Prize (Fez)
Half Day Workshop W10 - Mining using Matrices and Tensors (Agadir)
|
2:00 pm - 5:00 pm
|
Tutorial - Blogosphere: Research Issues, Applications, and Tools
(Kenitra A)
Tutorial - Graph Mining and Graph Kernels
(Baraka)
Tutorial - Applied Text Mining
(Casablanca C) |
3:00 pm - 3:30 pm
|
Coffee Break (Medinas A Foyer, Medinas D)
|
6:00 pm - 6:15 pm
|
Opening Remarks (Casablanca North)
|
6:15 pm - 6:45 pm
|
Award Presentations (Casablanca North)
|
6:45 pm - 7:30 pm
|
Innovation Award Talk (Casablanca North)
|
Monday
|
|
7:30 am - 8:00 pm
|
Registration (Medinas A Foyer)
|
7:30 am - 9:00 am
|
Continental Breakfast (Medinas A Foyer, Medinas D)
|
8:00 am - 6:00 pm
|
Exhibits (Casablanca South)
|
9:00 am - 10:00 am
|
Plenary Invited Talk - Trevor Hastie
(Casablanca North)
|
10:00 am - 10:30 am
|
Coffee Break (Medinas Foyer, Casablanca South)
|
10:30 am - 12:30 pm
|
Combined Session 1: Topic Modeling
Combined Session 2: Data Integration
Research Session 1: Social Networks
Research Session 2: Text Mining
|
12:30 pm - 2:00 pm
|
Conference Lunch (Casablanca North)
Sponsored by Microsoft adCenter Labs
|
2:00 pm - 3:35 pm
|
Research Session 3: Statistical Methods
Research Session 4: Graph Mining
Research Session 5: Classification
Industry Session 1: Invited Talk & Exploiting Location
Information and Geo-mining
Invited Talk - Thore Graepel
|
3:35 pm - 4:00 pm
|
Coffee Break (Medinas Foyer, Casablanca South)
|
4:00 pm - 5:20 pm
|
Research Session 6: Rank and Metric Learning
Research Session 7: Clustering and Distance Functions
Research Session 8: Streams and Evolving Data
Industrial Session 2: Social Networks
|
6:15 pm - 8:45 pm
|
Poster Reception I & Demo Session (Casablanca North)
Sponsored by Oracle
|
Tuesday
|
|
7:30 am - 5:00 pm
|
Registration (Medinas A Foyer)
|
7:30 am - 9:00 am
|
Continental Breakfast (Medinas A Foyer, Medinas D)
|
8:00 am - 6:00 pm
|
Exhibits (Casablanca South)
|
9:00 am - 10:00 am
|
Plenary Invited Talk - Michael Schwarz
(Casablanca North)
|
10:00 am - 10:30 am
|
Coffee Break (Medinas Foyer, Casablanca South)
|
10:30 am - 12:05 am
|
Research Session 9: Active and Semi-supervised Learning
Research Session 10: Discovery and Detection
Research Session 11: Pattern Mining
Industrial Session 3: Invited Talk & Visual Analytics
Invited Talk - Udo Miletzki
|
12:05 pm - 2:00 pm
|
SIGKDD Business Lunch
Sponsored by Yahoo!
|
2:00 pm - 3:20 pm
|
Research Session 12: Feature Selection
Research Session 13: Collaborative Filtering and Matrices
Research Session 14: Sequence Data
|
2:00 pm - 3:20 pm
|
Panel - Social Networks: Looking Ahead (Kenitra)
|
3:20 pm - 3:50 pm
|
Coffee Break (Medinas Foyer, Casablanca South)
|
3:50 pm - 5:10 pm
|
Research Session 15: SIGKDD Dissertation Award Winners
& Privacy
Research Session 16: Prediction Models
Combined Session 3: Performance and Scale
Industry Session 4: Medical Data Mining
|
5:15 pm - 6:15 pm
|
KDD Transfer Meeting (Agadir)
(SIGKDD-2008 and SIGKDD-2009 Organizers only)
|
5:30 pm - 8:00 pm
|
Poster Reception II & Demo Session (Ballroom)
Sponsored by Netflix
|
8:00 pm - 9:30 pm
|
Program Committee Dinner (Wynn Las Vegas)
|
Wednesday
|
|
7:30 am - 9:00 am
|
Continental Breakfast (Medinas A Foyer, Medinas D)
|
9:00 am - 10:00 am
|
Plenary Invited Talk - Jitendra Malik
(Casablanca North)
|
10:00 am - 10:30 am
|
Coffee Break (Medinas Foyer A, Medinas D)
|
10:30 am - 12:10 pm
|
Combined Session 4: Text Mining
Research Session 17: Partially Supervised Learning
Research Session 18: Matrix Methods
Industry Session 5: Search and Commerce
|
12:10 pm - 12:30 pm
|
Closing Remarks (Casablanca North)
|
Invited Talks
Trevor Hastie, Stanford University
Regularization Paths and Coordinate Descent
Chair: Sunita Sarawagi
Abstract
In a statistical world faced with an explosion of data, regularization has become
an important ingredient. In many problems, we have many more variables than observations,
and the lasso penalty and its hybrids have become increasingly useful. This talk
presents some effective algorithms based on coordinate descent for fitting large
scale regularization paths for a variety of problems. Joint work with Rob Tibshirani
and Jerome Friedman
Michael Schwarz, Yahoo! Research
Internet Advertising and Optimal Auction Design
Chair: Ying Li
Abstract
We characterize the optimal (revenue maximizing) auction for sponsored search advertising.
We show that a search engine's optimal reserve price is independent of the number
of bidders. Using simulations, we consider the changes that result from a search
engine's choice of reserve price and from changes in the number of participating
advertisers.
Jitendra Malik, UC Berkeley
The Future of Image Search
Chair: Bing Liu
Abstract
There are billions of images on the Internet. Today, searching for a desired image
is largely based on textual data such as filename or associated text on the web
page; not much use is made of the image content. There are good reasons for this.
The field of content-based image retrieval, which emerged during the 1990s, focused
primarily on color and texture cues. These were easier to model than shape, but
they turned out to be much less useful than originally hoped. I shall review some
of the recent developments in the field of visual object recognition in the computer
vision community that offer greater promise. Much better image features for characterizing
shape, advances in machine learning techniques, and the availability of large amounts
of training data lie at the heart of these approaches.
Thore Graepel, Microsoft Research
Large Scale Data Analysis and Modeling in Online Services and Advertising
Abstract
The last five years have seen a tremendous growth in online search, advertising
and gaming services. Today, it is extremely important to analyse large collections
of user interaction data as a first step in building predictive models for these
services. In this talk we will report on two applications of large scale data analysis
performed at Microsoft Research and how they guided model development:
- We will present the unique challenges involved in building a new advertisement ranking
algorithm starting with the near real-time analysis of click-through logs of weeks
of data. For this task, we created type-safe and very fast procedures to build a
data store of click-through meta-information about users and advertisements which
then guided the development of features for training a Bayesian click-through estimation
algorithm. We will discuss how system issues such as memory consumption and algorithmic
performance influenced the modelling process. We will also discuss the issue of
scientific programming languages capable of dealing with CPU intensive task while
allowing rapid prototyping and give a quick overview of F#, a functional programming
language ideally suited for this task.
- In the second half, we will give an insight into the data analysis and modelling
tasks that went into the development of Halo 3's online ranking and matchmaking
algorithm. At its core, Halo 3 uses the well-known TrueSkill ranking and matchmaking
system but before its launch, we performed thousands of simulations of ranking behaviour
on over 3,000,000 players varying speed of convergence, skill-level display and
other parameters. We will also discuss the limitations of this simulation and present
results how the running online part of the game today compares with the simulations.
As of today, over 800,000 unique players play over 2,000,000 Halo 3 matches every
24 hours.
Udo Miletzki, Siemens AG
The Genesis of Postal OCR and Beyond
Abstract
We provide an overview of the world largest industrial OCR application: Postal Address
Reading. We will talk about its humble beginnings and will elaborate how it evolved
rapidly to high-tech machinery and discuss its future prospects. Some prominent
historical-, system-, methodological-, cultural- and social aspects will also be
illuminated.
Every day, millions of mail pieces are automatically sorted and distributed based
on a powerful fleet of readers, which recognize millions of characters and words
per second and recombine them to meaningful and valid addresses. Cheques and paper
forms will vanish sooner or later, since they can be completely replaced by electronic
cash flow and e-forms. Mail, however, will persist and even grow in volume for three
good reasons: First, mail is conjoined with goods and material, especially in the
era of web-shopping. Second, Postal Services are the only world comprising service
reaching even the most remote places in the world. Third, postal services will undergo
a hybridization process, which means that mail and email will fuse to hybrid mail.
Hybrid mail will reach the recipient in the appropriate form according to the recipients
preferences, no matter, if it was sent as letter, fax or email.
|