Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: DBMS(Database class notes)
Description: this is database class notes . this will be very helpfull for you.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Attributes types and different form of datasets

16

What is Data?




Collection of data objects and their
attributes
An attribute is a property or
characteristic of an object

Attributes
Tid

– Examples: eye color of a person,
temperature, etc
...
g
...


– Ratio


Examples: length, time

Properties of Attribute Values
• The type of an attribute depends on which of the following
properties it possesses:
– Distinctness:
– Order:
– Addition:
– Multiplication:





<>
+-

=
*/

Nominal attribute: distinctness
Ordinal attribute: distinctness & order
Interval attribute: distinctness, order & addition
Ratio attribute: all 4 properties

Attribute
Type

Description

Examples

Operations

Nominal

The values of a nominal attribute are just
different names, i
...
, nominal attributes
provide only enough information to
distinguish one object from another
...


hardness of minerals,
{good, better, best},
grades, street numbers

median, percentiles,
rank correlation,
run tests, sign tests

Interval

For interval attributes, the differences
between values are meaningful, i
...
, a unit
of measurement exists
...
(*, /)

monetary quantities,
counts, age, mass, length,
electrical current

Interval data, also called an integer, is defined as a data type which is measured along a scale, in which each
point is placed at equal distance from one another
...

Interval data cannot be multiplied or divided, however, it can be added or subtracted
...
A simple example of interval data: The difference between 100 degrees
Fahrenheit and 90 degrees Fahrenheit is the same as 60 degrees Fahrenheit and 70 degrees Fahrenheit
...
Ratio data has a defined zero point
...


Discrete and Continuous Attributes
• Discrete Attribute

– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a collection of
documents
– Often represented as integer variables
...

– Practically, real values can only be measured and represented using a
finite number of digits
...


Types of data sets
• Record
– Data Matrix
– Document Data
– Transaction Data

• Graph
– World Wide Web

• Ordered





Spatial Data
Temporal Data
Sequential Data
Genetic Sequence Data

Record Data


Data that consists of a collection of records, each of which consists of a
fixed set of attributes T i d R e f u n d Marital Taxable

10

Status

Income

Cheat

1

Ye s

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Ye s

Married

120K

No

5

No

Divorced

95K

Ye s

6

No

Married

60K

No

7

Ye s

Divorced

220K

No

8

No

Single

85K

Ye s

9

No

Married

75K

No

10

No

Single

90K

Ye s

Data Matrix

• If data objects have the same fixed set of numeric attributes, then
the data objects can be thought of as points in a multi-dimensional
space, where each dimension represents a distinct attribute
• Such data set can be represented by an m by n matrix, where there
are m rows, one for each object, and n columns, one for each
attribute
Projection o f x
Load

Projection o f y
load

Distance

Load

Thickness

10
...
27

15
...
7

1
...
65

6
...
22

2
...
1

Text Data


Each document becomes a `term' vector,
– each term is a component (attribute) of the vector,
– the value of each component is the number of times the corresponding term
occurs in the document
...

– For example, consider a grocery store
...

TID

Items

1
2
3
4
5

B re a d , C o k e , M i l k
B e e r, B r e a d
B e e r, C o k e , D i a p e r, M i l k
B e e r, B re a d , D i a p e r, M i l k
C o k e , D i a p e r, M i l k

Graph Data
• Examples: Facebook graph and HTML Links
2

1

5
2
5

Ordered Data
• Genomic sequence data
GGTTCCGCCTTCAGCCCCGCGCC
CGCAGGGCCCGCCCCGCGCCGTC
GAGAAGGGCCCGCCTGGCGGGCG
GGGGGAGGCGGGGCCGCCCGAGC
CCAACCGAGTCCGACCAGGTGCC
CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG
GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG

Data Quality





What kinds of data quality problems?
How can we detect problems with the data?
What can we do about these problems?
Examples of data quality problems:
– Noise and outliers
– missing values
– duplicate data

Noise


Noise refers to modification of original values
– Examples: distortion of a person’s voice when talking on a poor phone
and “snow” on television screen

Two Sine Waves

Two Sine Waves + Noise

Outliers
• Outliers are data objects with characteristics that are
considerably different than most of the other data objects in
the data set

Missing Values

• Reasons for missing values

– Information is not collected
(e
...
, people decline to give their age and weight)
– Attributes may not be applicable to all cases
(e
...
, annual income is not applicable to children)

• Handling missing values





Eliminate Data Objects
Estimate Missing Values
Ignore the Missing Value During Analysis
Replace with all possible values (weighted by their probabilities)

Duplicate Data
• Data set may include data objects that are duplicates, or
almost duplicates of one another
– Major issue when merging data from heterogenous sources

• Examples:
– Same person with multiple email addresses

• Data cleaning
– Process of dealing with duplicate data issues

Difference between Data Analytics and Data Mining:
A key difference between data analytics and data mining is that data
mining does not require any preconceived hypothesis or notions before
tackling the data
...
However,
data analysis does need a hypothesis to test, as it is looking for answers
to particular questions
...
With the right software, they are able to collect the
data ready for further analysis
...
From here, a data mining specialist will usually report their
findings to the client, leaving the next steps in someone else’s hands
...
They need to assess the data, figure out

patterns, and draw conclusions
...
Data
analytics teams need to know the right questions to ask –

e
...
relation between reader gender and English paper
e
...
relation between trained and trained students versus committing
errors
e
...
inoculated with vaccine, not inoculated vaccine versus died of
disease , survived

Data mining usually does not need any visualizations, bar charts, graphs
etc
...
Without a good representation of the data in question, all the
efforts which are put into the analysis of the data would not come to
fruition
Title: DBMS(Database class notes)
Description: this is database class notes . this will be very helpfull for you.