Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: DBMS(Database class notes)
Description: this is database class notes . this will be very helpfull for you.
Description: this is database class notes . this will be very helpfull for you.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
Attributes types and different form of datasets
16
What is Data?
•
•
Collection of data objects and their
attributes
An attribute is a property or
characteristic of an object
Attributes
Tid
– Examples: eye color of a person,
temperature, etc
...
g
...
– Ratio
•
Examples: length, time
Properties of Attribute Values
• The type of an attribute depends on which of the following
properties it possesses:
– Distinctness:
– Order:
– Addition:
– Multiplication:
–
–
–
–
<>
+-
=
*/
Nominal attribute: distinctness
Ordinal attribute: distinctness & order
Interval attribute: distinctness, order & addition
Ratio attribute: all 4 properties
Attribute
Type
Description
Examples
Operations
Nominal
The values of a nominal attribute are just
different names, i
...
, nominal attributes
provide only enough information to
distinguish one object from another
...
hardness of minerals,
{good, better, best},
grades, street numbers
median, percentiles,
rank correlation,
run tests, sign tests
Interval
For interval attributes, the differences
between values are meaningful, i
...
, a unit
of measurement exists
...
(*, /)
monetary quantities,
counts, age, mass, length,
electrical current
Interval data, also called an integer, is defined as a data type which is measured along a scale, in which each
point is placed at equal distance from one another
...
Interval data cannot be multiplied or divided, however, it can be added or subtracted
...
A simple example of interval data: The difference between 100 degrees
Fahrenheit and 90 degrees Fahrenheit is the same as 60 degrees Fahrenheit and 70 degrees Fahrenheit
...
Ratio data has a defined zero point
...
Discrete and Continuous Attributes
• Discrete Attribute
– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a collection of
documents
– Often represented as integer variables
...
– Practically, real values can only be measured and represented using a
finite number of digits
...
Types of data sets
• Record
– Data Matrix
– Document Data
– Transaction Data
• Graph
– World Wide Web
• Ordered
–
–
–
–
Spatial Data
Temporal Data
Sequential Data
Genetic Sequence Data
Record Data
•
Data that consists of a collection of records, each of which consists of a
fixed set of attributes T i d R e f u n d Marital Taxable
10
Status
Income
Cheat
1
Ye s
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Ye s
Married
120K
No
5
No
Divorced
95K
Ye s
6
No
Married
60K
No
7
Ye s
Divorced
220K
No
8
No
Single
85K
Ye s
9
No
Married
75K
No
10
No
Single
90K
Ye s
Data Matrix
• If data objects have the same fixed set of numeric attributes, then
the data objects can be thought of as points in a multi-dimensional
space, where each dimension represents a distinct attribute
• Such data set can be represented by an m by n matrix, where there
are m rows, one for each object, and n columns, one for each
attribute
Projection o f x
Load
Projection o f y
load
Distance
Load
Thickness
10
...
27
15
...
7
1
...
65
6
...
22
2
...
1
Text Data
•
Each document becomes a `term' vector,
– each term is a component (attribute) of the vector,
– the value of each component is the number of times the corresponding term
occurs in the document
...
– For example, consider a grocery store
...
TID
Items
1
2
3
4
5
B re a d , C o k e , M i l k
B e e r, B r e a d
B e e r, C o k e , D i a p e r, M i l k
B e e r, B re a d , D i a p e r, M i l k
C o k e , D i a p e r, M i l k
Graph Data
• Examples: Facebook graph and HTML Links
2
1
5
2
5
Ordered Data
• Genomic sequence data
GGTTCCGCCTTCAGCCCCGCGCC
CGCAGGGCCCGCCCCGCGCCGTC
GAGAAGGGCCCGCCTGGCGGGCG
GGGGGAGGCGGGGCCGCCCGAGC
CCAACCGAGTCCGACCAGGTGCC
CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG
GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG
Data Quality
•
•
•
•
What kinds of data quality problems?
How can we detect problems with the data?
What can we do about these problems?
Examples of data quality problems:
– Noise and outliers
– missing values
– duplicate data
Noise
•
Noise refers to modification of original values
– Examples: distortion of a person’s voice when talking on a poor phone
and “snow” on television screen
Two Sine Waves
Two Sine Waves + Noise
Outliers
• Outliers are data objects with characteristics that are
considerably different than most of the other data objects in
the data set
Missing Values
• Reasons for missing values
– Information is not collected
(e
...
, people decline to give their age and weight)
– Attributes may not be applicable to all cases
(e
...
, annual income is not applicable to children)
• Handling missing values
–
–
–
–
Eliminate Data Objects
Estimate Missing Values
Ignore the Missing Value During Analysis
Replace with all possible values (weighted by their probabilities)
Duplicate Data
• Data set may include data objects that are duplicates, or
almost duplicates of one another
– Major issue when merging data from heterogenous sources
• Examples:
– Same person with multiple email addresses
• Data cleaning
– Process of dealing with duplicate data issues
Difference between Data Analytics and Data Mining:
A key difference between data analytics and data mining is that data
mining does not require any preconceived hypothesis or notions before
tackling the data
...
However,
data analysis does need a hypothesis to test, as it is looking for answers
to particular questions
...
With the right software, they are able to collect the
data ready for further analysis
...
From here, a data mining specialist will usually report their
findings to the client, leaving the next steps in someone else’s hands
...
They need to assess the data, figure out
patterns, and draw conclusions
...
Data
analytics teams need to know the right questions to ask –
e
...
relation between reader gender and English paper
e
...
relation between trained and trained students versus committing
errors
e
...
inoculated with vaccine, not inoculated vaccine versus died of
disease , survived
Data mining usually does not need any visualizations, bar charts, graphs
etc
...
Without a good representation of the data in question, all the
efforts which are put into the analysis of the data would not come to
fruition
Title: DBMS(Database class notes)
Description: this is database class notes . this will be very helpfull for you.
Description: this is database class notes . this will be very helpfull for you.