Data Mining: Concepts and Techniques

Han, Jiawei; Kamber, Micheline; Pei, Jian

dc.contributor.author	Han, Jiawei
dc.contributor.author	Kamber, Micheline
dc.contributor.author	Pei, Jian
dc.date.accessioned	2017-09-14T02:03:48Z
dc.date.available	2017-09-14T02:03:48Z
dc.date.issued	2012
dc.identifier.uri	http://202.88.229.59:8080/xmlui/handle/123456789/1040
dc.description.abstract	The chapters of the third edition are described brieﬂy as follows, with emphasis on the new material. Chapter 1 provides an introduction to the multidisciplinary ﬁeld of data mining. It discusses the evolutionary path of information technology, which has led to the need for data mining, and the importance of its applications. It examines the data types to be mined, including relational, transactional, and data warehouse data, as well as complex data types such as time-series, sequences, data streams, spatiotemporal data, multimedia data, text data, graphs, social networks, and Web data. The chapter presents a general classiﬁcation of data mining tasks, based on the kinds of knowledge to be mined, the kinds of technologies used, and the kinds of applications that are targeted. Finally, major challenges in the ﬁeld are discussed. Chapter 2 introduces the general data features. It ﬁrst discusses data objects and attribute types and then introduces typical measures for basic statistical data descriptions. It overviews data visualization techniques for various kinds of data. In addition to methods of numeric data visualization, methods for visualizing text, tags, graphs, and multidimensional data are introduced. Chapter 2 also introduces ways to measure similarity and dissimilarity for various kinds of data.Chapter 3 introduces techniques for data preprocessing. It ﬁrst introduces the concept of data quality and then discusses methods for data cleaning, data integration, data reduction, data transformation, and data discretization. Chapters 4 and 5 provide a solid introduction to data warehouses, OLAP (online analytical processing), and data cube technology. Chapter 4 introduces the basic concepts, modeling, design architectures, and general implementations of data warehouses and OLAP, as well as the relationship between data warehousing and other data generalization methods. Chapter 5 takes an in-depth look at data cube technology, presenting a detailed study of methods of data cube computation, including Star-Cubing and highdimensional OLAP methods. Further explorations of data cube and OLAP technologies are discussed, such as sampling cubes, ranking cubes, prediction cubes, multifeature cubes for complex analysis queries, and discovery-driven cube exploration. Chapters 6 and 7 present methods for mining frequent patterns, associations, and correlations in large data sets. Chapter 6 introduces fundamental concepts, such as market basket analysis, with many techniques for frequent itemset mining presented in an organized way. These range from the basic Apriori algorithm and its variations to more advanced methods that improve efﬁciency, including the frequent pattern growth approach, frequent pattern mining with vertical data format, and mining closed and max frequent itemsets. The chapter also discusses pattern evaluation methods and introduces measures for mining correlated patterns. Chapter 7 is on advanced pattern mining methods. It discusses methods for pattern mining in multilevel and multidimensional space, mining rare and negative patterns, mining colossal patterns and high-dimensional data, constraint-based pattern mining, and mining compressed or approximate patterns. It also introduces methods for pattern exploration and application, including semantic annotation of frequent patterns. Chapters 8 and 9 describe methods for data classiﬁcation. Due to the importance and diversity of classiﬁcation methods, the contents are partitioned into two chapters. Chapter 8 introduces basic concepts and methods for classiﬁcation, including decision tree induction, Bayes classiﬁcation, and rule-based classiﬁcation. It also discusses model evaluation and selection methods and methods for improving classiﬁcation accuracy, including ensemble methods and how to handle imbalanced data. Chapter 9 discusses advanced methods for classiﬁcation, including Bayesian belief networks, the neural network technique of backpropagation, support vector machines, classiﬁcation using frequent patterns, k-nearest-neighbor classiﬁers, case-based reasoning, genetic algorithms, rough set theory, and fuzzy set approaches. Additional topics include multiclass classiﬁcation, semi-supervised classiﬁcation, active learning, and transfer learning. Cluster analysis forms the topic of Chapters 10 and 11. Chapter 10 introduces the basic concepts and methods for data clustering, including an overview of basic cluster analysis methods, partitioning methods, hierarchical methods, density-based methods, and grid-based methods. It also introduces methods for the evaluation of clustering. Chapter 11 discusses advanced methods for clustering, including probabilistic modelbased clustering, clustering high-dimensional data, clustering graph and network data, and clustering with constraints.Chapter 12 is dedicated to outlier detection. It introduces the basic concepts of outliers and outlier analysis and discusses various outlier detection methods from the view of degree of supervision (i.e., supervised, semi-supervised, and unsupervised methods), as well as from the view of approaches (i.e., statistical methods, proximity-based methods, clustering-based methods, and classiﬁcation-based methods). It also discusses methods for mining contextual and collective outliers, and for outlier detection in high-dimensional data. Finally, in Chapter 13, we discuss trends, applications, and research frontiers in data mining. We brieﬂy cover mining complex data types, including mining sequence data (e.g., time series, symbolic sequences, and biological sequences), mining graphs and networks, and mining spatial, multimedia, text, and Web data. In-depth treatment of data mining methods for such data is left to a book on advanced topics in data mining, the writing of which is in progress. The chapter then moves ahead to cover other data mining methodologies, including statistical data mining, foundations of data mining, visual and audio data mining, as well as data mining applications. It discusses data mining for ﬁnancial data analysis, for industries like retail and telecommunication, for use in science and engineering, and for intrusion detection and prevention. It also discusses the relationship between data mining and recommender systems. Because data mining is present in many aspects of daily life, we discuss issues regarding data mining and society, including ubiquitous and invisible data mining, as well as privacy, security, and the social impacts of data mining. We conclude our study by looking at data mining trends.	en_US
dc.publisher	Elsevier	en_US
dc.subject	Data Mining	en_US
dc.title	Data Mining: Concepts and Techniques	en_US
dc.type	Book	en_US