Advances in Knowledge Discovery and Data Mining: 13th Pacific-asia Conference, Pakdd 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings by Thanaruk Theeramunkong

This book constitutes the refereed proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009, held in Bangkok, Thailand, in April 2009. The 39 revised full papers and 73 revised short papers presented together with 3 keynote talks were carefully reviewed and selected from 338 submissions. The papers present new ideas, original research results, and practical development experiences from all KDD-related areas including data mining, data warehousing, machine learning, databases, statistics, knowledge acquisition, automatic scientific discovery, data visualization, causal induction, and knowledge-based systems.
Table of Contents

Keynote Speeches.- KDD for BSN - Towards the Future of Pervasive Sensing.- Finding Hidden Structures in Relational Databases.- The Future of Search: An Online Content Perspective.- Regular Papers.- DTU: A Decision Tree for Uncertain Data.- Efficient Privacy-Preserving Link Discovery.- On Link Privacy in Randomizing Social Networks.- Sentence-Level Novelty Detection in English and Malay.- Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words.- Cool Blog Classification from Positive and Unlabeled Examples.- Thai Word Segmentation with Hidden Markov Model and Decision Tree.- An Efficient Method for Generating, Storing and Matching Features for Text Mining.- Robust Graph Hyperparameter Learning for Graph Based Semi-supervised Classification.- Regularized Local Reconstruction for Clustering.- Clustering with Lower Bound on Similarity.- Approximate Spectral Clustering.- An Integration of Fuzzy Association Rules and WordNet for Document Clustering.- Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm.- A Polynomial-Delay Polynomial-Space Algorithm for Extracting Frequent Diamond Episodes from Event Sequences.- A Statistical Approach for Binary Vectors Modeling and Clustering.- Multi-resolution Boosting for Classification and Regression Problems.- Interval Data Classification under Partial Information: A Chance-Constraint Approach.- Negative Encoding Length as a Subjective Interestingness Measure for Groups of Rules.- The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree.- Discovering Periodic-Frequent Patterns in Transactional Databases.- Quantifying Asymmetric Semantic Relations from Query Logs by Resource Allocation.- Acquiring Semantic Relations Using the Web for Constructing Lightweight Ontologies.- Detecting Abnormal Events via Hierarchical Dirichlet Processes.- Active Learning for Causal Bayesian Network Structure with Non-symmetrical Entropy.- A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification.- Analysis of Variational Bayesian Matrix Factorization.- Variational Bayesian Approach for Long-Term Relevance Feedback.- Detecting Link Hijacking by Web Spammers.- A Data Driven Ensemble Classifier for Credit Scoring Analysis.- A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams.- Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data.- Exploiting the Block Structure of Link Graph for Efficient Similarity Computation.- Online Feature Selection Algorithm with Bayesian ?1 Regularization.- Feature Selection for Local Learning Based Clustering.- RV-SVM: An Efficient Method for Learning Ranking SVM.- A Kernel Framework for Protein Residue Annotation.- Dynamic Exponential Family Matrix Factorization.- A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization.- Short Papers.- Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem.- Using Highly Expressive Contrast Patterns for Classification - Is It Worthwhile?.- Arif Index for Predicting the Classification Accuracy of Features and Its Application in Heart Beat Classification Problem.- UCI++: Improved Support for Algorithm Selection Using Datasetoids.- Accurate Synthetic Generation of Realistic Personal Information.- An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining.- Information Extraction from Thai Text with Unknown Phrase Boundaries.- A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques.- A Hybrid Approach to Improve Bilingual Multiword Expression Extraction.- Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences.- Scalable Web Mining with Newistic.- Building a Text Classifier by a Keyword and Unlabeled Documents.- A Discriminative Approach to Topic-Based Citation Recommendation.- Romanization of Thai Proper Names Based on Popularity of Usages.- Budget Semi-supervised Learning.- When does Co-training Work in Real Data?.- Classification of Audio Signals Using a Bhattacharyya Kernel-Based Centroid Neural Network.- Sparse Kernel Learning and the Relevance Units Machine.- Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces.- Clustering Documents Using a Wikipedia-Based Concept Representation.- An Instantiation of Hierarchical Distance-Based Conceptual Clustering for Propositional Learning.- Computing Substitution Matrices for Genomic Comparative Analysis.- Mining Both Positive and Negative Impact-Oriented Sequential Rules from Transactional Data.- Aggregated Subset Mining.- Hot Item Detection in Uncertain Data.- Spanning Tree Based Attribute Clustering.- The Effect of Varying Parameters and Focusing on Bus Travel Time Prediction.- Transfer Learning Action Models by Measuring the Similarity of Different Domains.- On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity.- Discovering Action Rules That Are Highly Achievable from Massive Data.- Extracting Fuzzy Rules for Detecting Ventricular Arrhythmias Based on NEWFM.- Trace Mining from Distributed Assembly Databases for Causal Analysis.- Let's Tango - Finding the Right Couple for Feature-Opinion Association in Sentiment Analysis.- An Efficient Candidate Pruning Technique for High Utility Pattern Mining.- Grouped ECOC Conditional Random Fields for Prediction of Web User Behavior.- CLHQS: Hierarchical Query Suggestion by Mining Clickthrough Log.- X-Tracking the Changes of Web Navigation Patterns.- Tree-Based Method for Classifying Websites Using Extended Hidden Markov Models.- Emotion Recognition of Pop Music Based on Maximum Entropy with Priors.- Simultaneously Finding Fundamental Articles and New Topics Using a Community Tracking Method.- Towards a Novel Association Measure via Web Search Results Mining.- A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data.- Mining Outliers with Faster Cutoff Update and Space Utilization.- Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data.- K-Dominant Skyline Computation by Using Sort-Filtering Method.- Effective Boosting of Naïve Bayesian Classifiers by Local Accuracy Estimation.- COMUS: Ontological and Rule-Based Reasoning for Music Recommendation System.- Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval.- Item Preference Parameters from Grouped Ranking Observations.- Cross-Channel Query Recommendation on Commercial Mobile Search Engine: Why, How and Empirical Evaluation.- Data Mining for Intrusion Detection: From Outliers to True Intrusions.- A Multi-resolution Approach for Atypical Behaviour Mining.- Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions.- Centroid Neural Network with Spatial Constraints.- Diversity in Combinations of Heterogeneous Classifiers.- Growth Analysis of Neighbor Network for Evaluation of Damage Progress.- A Parallel Algorithm for Finding Related Pages in the Web by Using Segmented Link Structures.- Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study.- Similarity-Based Feature Selection for Learning from Examples with Continuous Values.- Application-Independent Feature Construction from Noisy Samples.- Estimating Optimal Feature Subsets Using Mutual Information Feature Selector and Rough Sets.- Speeding Up Similarity Search on a Large Time Series Dataset under Time Warping Distance.- A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data.- Clustering Data Streams in Optimization and Geography Domains.- CBDT: A Concept Based Approach to Data Stream Mining.- Meaningful Subsequence Matching under Time Warping Distance for Data Stream.- An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise.- On Pairwise Kernels: An Efficient Alternative and Generalization Analysis.- A Family-Based Evolutional Approach for Kernel Tree Selection in SVMs.- An Online Incremental Learning Vector Quantization.- On Mining Rating Dependencies in Online Collaborative Rating Networks.- Learning to Extract Relations for Relational Classification.