Data Mining Fundamentals
Overview
As IT technology in both hardware and software continues to grow, capturing and recording data is becoming increasingly cheaper and more prevalent. Some search engines, for examples, receives billions of queries and collects tera bytes of data daily. In view of the fact that data are becoming available in unprecedented volumes, harvesting and mining them to gain advantage in business and other activities is of crucial importance. Data mining technology utilises advances in statistics, machine learning, pattern recognition, databases, and high performance computing and has become an indispensable technology for industry and businesses. Data mining offers tools for the discovery of relationship, patterns and knowledge from data. The results and products of a data mining process could provide knowledge and foresight that enable us plan better, make informed decisions and judgements, predict future activities and achieve higher targets.
This is an introductory course in data mining. It aims to offer participants an overview of the principles of data mining. It introduces participants to what lies at the core of data mining process and the techniques in both science and technology that are employed. The course concentrates on three core components of data mining, namely, the data management, machine learning techniques, and data mining application.
R is used to demonstrates some of the concepts.
Training at our premises
Please fill in the form below, and we will contact you to discuss course availabilities.
Customised Onsite Training
We can provide customised training for this course delivered onsite at your premises on dates most suitable to you. Please fill in the form below and we will contact you to discuss your request and requirement.
Audience
This course is suitable for those who are interested to learn or improve their knowledge of data mining for both theoretical and practical reasons.
Prerequisite
While a basic knowledge of the principles of data mining such as statistics and machine learning techniques is beneficial, it is not essential. This course is an introductory course and has therefore no prerequisites.
Course Objectives
The course has the goal of offering the candidates:-
- An overview of core topics in data mining
- A hands-on appreciation of data mining techniques
- Using R to do some data mining tasks
Skills taught
At the end of the course, candidates will have gained the following skills:
- An initial understanding of the concept of data and ways for their analysis
- An initial understanding of machine learning techniques
- An initial knowledge of R in carrying out data mining tasks
- Ability to apply some data mining techniques to analyse data
Reference and Reading Materials
- Introduction to Data Mining. Pang-Ning Tan, M. Steinbach, V. Kumar (2005)
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Hastie, Tibshirani, Friedman (2009)
- Data Mining: Concepts and Techniques. J. Han, M. Kamber and J. Pei. 3rd ed.(2010)
Related Courses
- Advanced Data Mining
- Machine Learning Methods
- Text Mining
Full list of courses >>
Outline
The course is composed of three major parts, Data management, machine learning technique, and data mining applications.
Part I - Data Management
Data Mining Introduction
Data Mining Concepts and challenges?
Data Mining Origins and Tasks
Exploring Data
Types of Data
Data Collection
Data Storage and Management
Data Preparation
Data Preprocessing
Summary Statistics
Visualization
OLAP and Multidimensional Data Analysis
Part II - Data Mining Methodology
Classification
Approach to Solving a Classification Problem
Decision Tree Induction
Model Overfitting
Classifier Evaluation
Methods for Comparing Classifiers
Rule-Based Classifier
Nearest-Neighbor Classifiers
Bayesian Classifiers
Artificial Neural Network (ANN)
Association Analysis
Frequent Itemset Generation
Rule Generation
Compact Representation of Frequent Itemsets
Methods for Generating Frequent Itemsets
FP-Growth Algorithm
Evaluation of Association Patterns
Effect of Skewed Support Distribution
Handling Categorical and Continuous Attributes
Handling a Concept Hierarchy
Sequential Patterns
Cluster Analysis
K-means
Agglomerative Hierarchical Clustering
DBSCAN
Cluster Evaluation
Clustering Algorithms
Prototype-Based Clustering
Density-Based Clustering
Part III - Data Mining Applications
Anomaly Detection
Mining Customer Relationship Management (CRM) Data
Mining Science and Engineering Data
Mining Geospatial Data
Mining Text Data
Mining Human Performance Data
|