Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Extraction of meaningful insights in data analytics
Description: The Document talks about the extraction of meaningful insights in data analytics and also the Information a person should know about the topic.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Exploratory Data Analysis: The Key to Extracting Meaningful Insights from Data
Exploratory Data Analysis (EDA):
Let us start the kit from knowing the importance of EDA in any data science project
...


Data Set Description:
You can make use of Digital platform to come across Random data sets
...
pyplot as pltdata =
pd
...
csv’)print(data
...
shape)print(data
...
dty
pes)print(data
...
To find the number of missing values in a column, you can use the sum()
function after calling isnull()
...
In such cases, you can use imputation methods such as
mean or median to replace these zeros
...


To visualize the presence of outliers in the data, you can use a box plot
...
The
presence of outliers can impact the choice of imputation method
...
boxplot() function from the seaborn library
...
savefig() function
...
Outliers in the data can be removed using the concept of quantile
...
This will help in deciding whether to impute the missing values
using mean or median
...
For
example, the records falling below the 99th percentile can be removed using quantile function
...


Feature Selection
Feature selection is a useful technique to select only the most crucial features for the analysis
...
Highly
correlated features can be removed and only one of them can be taken for analysis
...
distplot function
...
If it is nonsymmetric, imputation should be done using
median or any other algorithm
...
If the data is
biased towards one class, the results will be affected
...
If the data is imbalanced, apply techniques to resolve
the problem such as under sampling or over sampling
...
For example, if insulin
has a non-symmetric distribution, replace the 0 values with the median using the replace function
...


Summary of EDA Tasks
1
...
Check data types
3
...
Check for missing values
5
...
Check distribution with box plots
7
...
Create correlation heat maps Complete the EDA tasks yourself and try to implement the techniques
mentioned in Different data sets
...

Extraction of meaningful insights using SQL:
In this Part of the kit , we will focus on SQL, where we will cover the basics of creating a table, what a
database is, and how SQL is used in real life
...
Tables are used to store data related to different elements
...

Each table has rows and columns, and the data is organized in a relational database with primary and
foreign keys to relate the tables to each other
...
Non-Relational Databases:
In a relational database, data is structured and organized into rows and columns, and tables are related
to each other through primary and foreign keys
...
SQL is mainly used for relational databases
...
It is a powerful language used for data analysis and extracting key metrics from the
data
...
It is used by data analysts and scientists to provide efficient solutions for reporting to
stakeholders
...

In this tutorial, we will create a retail database with three tables: customers, products, and orders
...
Then, we will create the
customers table with attributes such as customer ID, name, city, and state
...
Finally, we will create the orders table
with attributes such as order date, product ID, and customer ID
...
The commands include
CREATE, ALTER, and DROP
...
The commands include INSERT, UPDATE, and DELETE
...
The command SELECT is used, and the WHERE clause can be
used to filter data
...
The most common
types of joins are INNER JOIN and OUTER JOIN
...
Foreign key is used to connect
two tables and maintain the relationship and integrity between the data
...


Types of Joins:
Left join: get everything from left table and matching data from right table
Right join: get everything from right table and matching data from left table
Inner join: get matching data from both tables
Full join: get all data from both tables
Practical Examples of Joins:
Using left join to extract data for all customers who ordered or didn’t order a product:
SELECT c
...
order_date, o
...
id = o
...
*, o
...
product_id FROM customers c INNER JOIN orders o ON c
...
customer_id

Aggregate Functions:
Aggregate functions like sum, average, minimum, maximum, and count can be used to perform
calculations on a set of values
...
id, DATE_TRUNC(‘month’, o
...
price) AS amount_spent FROM
customers c INNER JOIN orders o ON c
...
customer_id INNER JOIN products p ON o
...
id
GROUP BY c
...
Union and Union All
are used to append data from different tables
...
Rank, Dense Rank, and Row Number are useful
for extracting key metrics from data
...


Using Aggregate Functions:
Aggregate functions are useful for analyzing data and obtaining summary statistics
...

Case statements are useful for flagging data, such as identifying customers whose orders have been
delivered
...

Rank assigns a unique rank to each row, skipping the next number if there are ties
...

Row Number assigns a unique number to each row
...

By extracting key metrics from data, stakeholders can make informed business decisions
...
The primary key is a unique identifier for each row in
the table
...
Break it down into different parts and
focus on one at a time
...
They are not the primary key of the other
table
Title: Extraction of meaningful insights in data analytics
Description: The Document talks about the extraction of meaningful insights in data analytics and also the Information a person should know about the topic.