Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Data Mining Tutorial
Description: Data Mining Tutorial

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Data Mining Tutorial
There is a wealth of data available to the information sector
...
To
extract pertinent information from this enormous number of
data, analysis is required
...
We will be able to use this information
after completing all of these stages for a number of objectives,
such as fraud detection, market analysis, production control,
scientific investigation, etc
...
Data mining is, in other words, the
process of extracting knowledge from data
...

Identifying Customer Requirements − Data mining
helps in identifying the best products for different
customers
...

Cross Market Analysis − Data mining performs
Association/correlations between product sales
...

Determining Customer purchasing pattern − Data
mining helps in determining customer purchasing
pattern
...


Corporate Analysis and Risk Management
Data mining is used in the following fields of the Corporate
Sector −
Finance Planning and Asset Evaluation − It involves
cash flow analysis and prediction, contingent claim
analysis to evaluate assets
...

• Competition − It involves monitoring competitors and
market directions
...
Finding the call's
destination, duration, day of the week, etc
...
Additionally, it
examines patterns that differ from the usual
...

On the basis of the kind of data to be mined, there are two
categories of functions involved in Data Mining −



Descriptive
Classification and Prediction

Descriptive Function
The descriptive function deals with the general properties of
data in the database
...
For instance, a corporation might
sell computers and printers, and its ideas of clients might include
big spenders and budget spenders
...
The following
two methods can be used to derive these descriptions −
Data Characterization − This refers to summarizing
data of class under study
...

• Data Discrimination − It refers to the mapping or
classification of a class with some predefined group or
class
...
Here is the list of kind of frequent patterns −


Frequent Item Set − It refers to a set of items that
frequently appear together, for example, milk and
bread
...

• Frequent Sub Structure − Substructure refers to
different structural forms, such as graphs, trees, or
lattices, which may be combined with item-sets or
subsequences
...
This process refers to the
process of uncovering the relationship among data and
determining association rules
...

Mining of Correlations
It is a kind of additional analysis performed to uncover
interesting statistical correlations between associated-attributevalue pairs or between two item sets to analyze that if they have
positive, negative or no effect on each other
...
Cluster
analysis refers to forming group of objects that are very similar
to each other but are highly different from the objects in other
clusters
...
The goal is to
be able to forecast the class of objects whose class label is
unknown using this model
...
The following formats for
presenting the generated model are available −





Classification (IF-THEN) Rules
Decision Trees
Mathematical Formulae
Neural Networks

The list of functions involved in these processes are as follows −






Classification − It predicts the class of objects whose
class label is unknown
...
The Derived Model is based on the analysis
set of training data i
...
the data object whose class
label is well known
...

Regression Analysis is generally used for prediction
...

Outlier Analysis − Outliers may be defined as the data
objects that do not comply with the general behavior
or model of the data available
...

Data Mining Task Primitives
• We can specify a data mining task in the form of a
data mining query
...

• A data mining query is defined in terms of data mining
task primitives
...
Here is the list
of Data Mining Task Primitives −
Set of task relevant data to be mined
...

• Background knowledge to be used in discovery
process
...

• Representation for visualizing the discovered
patterns
...

This portion includes the following −



Database Attributes
Data Warehouse dimensions of interest

Kind of knowledge to be mined
It refers to the kind of functions to be performed
...
For example, the Concept hierarchies are
one of the background knowledge that allows data to be mined
at multiple levels of abstraction
...
There are different interesting
measures for different kind of knowledge
...
These representations may include the following
...
There are numerous heterogeneous data
sources that must be combined
...
In this tutorial, we'll go through the key
concerns relating to −




Mining Methodology and User Interaction
Performance Issues
Diverse Data Types Issues

The following diagram describes the major issues
...
Therefore it is necessary for data mining to
cover a broad range of knowledge discovery task
...

Incorporation of background knowledge − To guide
discovery process and to express the discovered

patterns, the background knowledge can be used
...

• Data mining query languages and ad hoc data mining −
Data Mining Query language that allows the user to
describe ad hoc mining tasks, should be integrated
with a data warehouse query language and optimized
for efficient and flexible data mining
...
These representations should be
easily understandable
...

If the data cleaning methods are not there then the
accuracy of the discovered patterns will be poor
...

Performance Issues
There can be performance-related issues such as follows −


Efficiency and scalability of data mining algorithms − In
order to effectively extract the information from huge
amount of data in databases, data mining algorithm
must be efficient and scalable
...
These
algorithms divide the data into partitions which is
further processed in a parallel fashion
...
The incremental
algorithms, update databases without mining the data
again from scratch
...
It is not possible for one system to mine all these
kind of data
...
These data
source may be structured, semi structured or
unstructured
...



Data Mining - Evaluation

Data Warehouse
A data warehouse exhibits the following characteristics to
support the management's decision-making process −
Subject Oriented − Data warehouse is subject oriented
because it provides us the information around a
subject rather than the organization's ongoing
operations
...
The data warehouse
does not focus on the ongoing operations, rather it
focuses on modelling and analysis of data for decisionmaking
...
This integration
enhances the effective analysis of data
...
The data in a
data warehouse provides information from a historical
point of view
...
The data
warehouse is kept separate from the operational
database therefore frequent changes in operational
database is not reflected in the data warehouse
...
A data warehouse is constructed by integrating
the data from multiple heterogeneous sources
...

Data warehousing involves data cleaning, data integration, and
data consolidations
...
This approach is used to build wrappers and
integrators on top of multiple heterogeneous databases
...

Process of Query Driven Approach
• When a query is issued to a client side, a metadata
dictionary translates the query into the queries,
appropriate for the individual heterogeneous site
involved
...

• The results from heterogeneous sites are integrated
into a global answer set
...

It is very inefficient and very expensive for frequent
queries
...

Update-Driven Approach


Today's data warehouse systems follow update-driven approach
rather than the traditional approach discussed earlier
...
This information is available for direct querying and
analysis
...

The data can be copied, processed, integrated,
annotated, summarized and restructured in the
semantic data store in advance
...

From Data Warehousing (OLAP) to Data Mining (OLAM)
Online Analytical Mining integrates with Online Analytical
Processing with data mining and mining knowledge in
multidimensional databases
...
These steps are very
costly in the preprocessing of data
...

Available information processing infrastructure
surrounding
data
warehouses −
Information
processing infrastructure refers to accessing,
integration, consolidation, and transformation of
multiple heterogeneous databases, web-accessing and
service facilities, reporting and OLAP analysis tools
...

OLAM provides facility for data mining on various
subset of data and at different levels of abstraction
...


Data Mining - Terminologies

Data Mining
Data mining is defined as extracting the information from a huge
set of data
...
This information can be used for any
of the following applications −
Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Data Mining Engine


Data mining engine is very essential to the data mining system
...
This knowledge is used to guide
the search or evaluate the interestingness of the resulting
patterns
...
Here is the list of steps involved in the
knowledge discovery process −

Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Data Mining
• Pattern Evaluation
• Knowledge Presentation
User interface


User interface is the module of data mining system that helps
the communication between users and the data mining system
...

• Providing information to help focus the search
...

• Browse database and data warehouse schemas or
data structures
...

• Visualize the patterns in different forms
...
Data integration may involve inconsistent
data and therefore needs data cleaning
...
Data cleaning

involves transformations to correct the wrong data
...

Data Selection
Data Selection is the process where data relevant to the analysis
task are retrieved from the database
...

Clusters
Cluster refers to a group of similar kind of objects
...

Data Transformation
In this step, data is transformed or consolidated into forms
appropriate for mining, by performing summary or aggregation
operations
Title: Data Mining Tutorial
Description: Data Mining Tutorial