Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: DATA SCIENCE
Description: It consists of various planning model tools,uses of data science,life cycle of data science,which tool plays important role in data science and y? Skills required to become a data scientist and importance of python in data science etc.
Description: It consists of various planning model tools,uses of data science,life cycle of data science,which tool plays important role in data science and y? Skills required to become a data scientist and importance of python in data science etc.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
DATA SCIENCE
INTRODUCTION
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms
and systems to extract from data in various Forms, structured and unstructured, similar to
data mining
...
it employs techniques and theories
drawn from many fields within context of mathematics, statistics, information science and
computer science
...
-Data Science is a more forward-looking approach, an exploratory way with the focus on
analysing the past or current data and predicting the future outcomes with the aim of making
informed decisions
...
The purpose of data science is to find patterns
...
Most people have heard stories about companies who used data science to predict the future,
revolutionise their business, or even disrupt an industry
...
Although, many tools present in the market but R is the most
commonly used tool
...
Here, I present different analytic solutions that help managers to understand and segment
their customers based on the purchase history
...
Association Rules:
The association rule algorithm enables us to find interesting relations within multiple
databases and answer questions such as: which products tend to be purchased together? This
algorithm is also used in medical diagnosis, bio-medical, census data, fraud detection,
CRM, recommendation system and content optimisation
...
Today, these data science methods are used in finance, economics, biology engineering,
retails and manufacturing
...
The company also wishes to predict which valuable employees will leave next
...
Scraping, Geocoding and Emailing:
At work, we all need relevant information delivered in a timely manner and data science can
help to do that
...
SQL, or Structured Query Language, is a special-purpose programming language for
managing data held in relational database management systems
...
Some of what you can do with SQL—data insertion, queries, updating and deleting, schema
creation and modification, and data access control—you can also accomplish with R, Python,
or even Excel, but writing your own SQL code is more efficient, yields easily reproducible
scripts, and keeps you closer to the data, according to that
...
Things you can do with SQL
...
Handle dates:
―Fantastic date functions‖ exist to meet all your formatting and type conversion needs
...
Find the median:
Since there‘s no built-in aggregate function for median, that provides the code
...
Generate sequences:
Use the generate series function to create ranges of dates and times and to handle time series
and funnels
...
A healthy market share – SAS still holds the biggest market share in terms of jobs even in
advanced market like the US & the UK, the job market share of SAS would be at least 40%
...
Ease of learning and awesome support – Among all the tools I know, SAS would probably
qualify as the easiest to learn
...
WHAT ARE THE SKILLS OF DATA SCIENTIST?
-critical thinking
-coding
-mathematics
-machine learning, deep learning
-communication
-data architecture
-risk analysis, process improvement, systems engineering
-problem solving
-dealing with structured and unstructured data
WHAT DO YOU DO AS A DATA SCIENTIST?
―More generally, a data scientist is someone who knows how to extract meaning from and
interpret data, which requires both tools and methods from statistics and machine learning, as
well as being human
...
This trend is most pronounced among individual contributors—at level 1, data scientists with
a PhD earn a median base salary of $102,000 while those with a Master's degree earn a
median base salary of $92,500
...
USES OF DATA SCIENCE:
Using data science, companies have become intelligent enough to push & sell products as per
customer‘s purchasing power & interest
...
All these search engines (including Google) make use of data science algorithms to deliver
the best result for our searched query in fraction of seconds
...
Had there been no data science, Google
wouldn‘t have been the ‗Google‘ we know today
...
Starting from the
display banners on various websites to the digital bill boards at the airports –almost all of
them are decided by using data science algorithms
...
They can be targeted based on user‘s past behaviour
...
3) Recommender Systems:
Who can forget the suggestions about similar products on Amazon? They not only help you
find relevant products from billions of products available with them, but also adds a lot to the
user experience
...
Internet giants
like Amazon, Twitter, Google Play, Netflix, LinkedIn, iamb and many more uses this system
to improve user experience
...
4) Image Recognition:
You upload your image with friends on Facebook and you start getting suggestions to tag
your friends
...
Similarly, while using what Sapp web, you scan a barcode in your web browser using your
mobile phone
...
It uses image recognition and provides related search results
...
Using speech
recognition feature, even if you aren‘t in a position to type a message, your life wouldn‘t
stop
...
However, at times, you would realize, speech recognition doesn‘t perform accurately
...
Games are now designed using machine learning algorithms which
improve / upgrade themselves as the player moves up to a higher level
...
7) Price Comparison Websites:
At a basic level, these websites are being driven by lots and lots of data which is fetched
using APIs and RSS Feeds
...
PriceGrabber, PriceRunner, Junglee,
Shopzilla, DealTime are some examples of price comparison websites
...
8) Airline Route Planning:
Airline Industry across the world is known to bear heavy losses
...
With high rise in air fuel prices and need to offer heavy discounts to customers has further
made the situation worse
...
Now using data science, the airline
companies can:
-Predict flight delay
-Decide which class of airplanes to buy
-Whether to directly land at the destination, or take a halt in between (For example: A flight
can have a direct route from New Delhi to New York
...
)
-Effectively drive customer loyalty programs
...
9) Fraud and Risk Detection:
One of the first applications of data science originated from Finance discipline
...
However, they had a lot of data which use to get collected during the initial paper work while
sanctioning loans
...
Over the years, banking companies learned to divide and conquer data via customer
profiling, past expenditures and other essential variables to analyze the probabilities of risk
and default
...
10) Delivery logistics:
Who says data science has limited applications? Logistic companies like DHL, FedEx, UPS,
have used data science to improve their operational efficiency
...
Further, more the data that these companies generate using the GPS installed, provides them a
lots of possibilities to explore using data science
...
Policies and every possible industry where data gets generated
...
In addition, predicting the wallet share
of a customer, which customer is likely to churn, which customer should be pitched for high
value product and many other questions can be easily answered by data science
...
12) Coming Up In Future:
Though, not much has been revealed about them except the prototypes, and neither I know
when they would be available for a common man‘s disposal
...
We
need to wait and watch how far Google can become successful in their self-driving cars
project
...
Let‘s see, what our future holds for us!
LIFECYCLE OF DATA SCIENCE
1
...
They‘re the people
who want to ensure that every decision made in the company is supported by concrete data,
and that it is guaranteed (with a high probability) to achieve results
...
According to Microsoft Azure‘s blog, we typically use data science to answer five types of
questions:
-How much or how many? (Regression)
-Which category? (Classification)
-Which group? (Clustering)
-Is this weird? (Anomaly detection)
-Which option should be taken? (Recommendation)
2
...
Data mining is the process of gathering your data from different sources
...
At this stage, some of the questions worth
considering are - what data do I need for my project? Where does it live? How can I obtain
it? What is the most efficient way to store and access all of it?
If all the data necessary for the project is packaged and handed to you, you‘ve won the
lottery
...
If the data lives
in databases, your job is relatively simple - you can query the relevant data using SQL
queries, or manipulate it using a data frame tool like Pandas
...
Beautiful
Soup is a popular library used to scrape web pages for data
...
Analytics, for example, allows you to define custom events within the app which can help
you understand how your users behave and collect the corresponding data
...
Data Cleaning:
Now that you‘ve got all of your data, we move on to the most time-consuming step of all cleaning and preparing the data
...
According to interviews with data scientists, this process (also referred to as ‗data janitor
work‘) can often take 50 to 80 per cent of their time
...
For instance, the data could also have
inconsistencies within the same column, meaning that some rows could be labelled 0 or 1,
and others could be labelled no or yes
...
If we‘re dealing with a categorical data type with multiple categories,
some of the categories could be misspelled or have different cases, such as having categories
for both male and Male
...
One of the steps that is often forgotten in this stage, causing a lot of problems later on, is the
presence of missing data
...
One option is to
either ignore the instances which have any missing values
...
Another common approach is to use something called average imputation, which replaces
missing values with the average of all the other instances
...
4
...
The data exploration stage is like the brainstorming of data analysis
...
It could involve pulling up
and analysing a random subset of the data using Pandas, plotting a histogram or distribution
curve to see the general trend, or even creating an interactive visualization that lets you dive
down into each data point and explore the story behind the outliers
...
If you were predicting student scores for example, you could try visualizing
the relationship between scores and sleep
...
5
...
If we were predicting the scores of a student, a possible feature is the amount of
sleep they get
...
According to Andrew Ng, one of the top experts in the fields of machine learning and deep
learning, ―Coming up with features is difficult, time-consuming, requires expert knowledge
...
‖ Feature engineering is the
process of using domain knowledge to transform your raw data into informative features that
represent the business problem you are trying to solve
...
We typically perform two types of tasks in feature engineering - feature selection
and construction
...
This is typically done to avoid the curse of dimensionality, which refers to the
increased complexity that arises from high-dimensional spaces (i
...
way too many features)
...
6
...
I use the term predictive modelling because I think a good project is not one that just
trains a model and obsesses over the accuracy, but also uses comprehensive statistical
methods and tests to ensure that the outcomes from the model actually make sense and are
significant
...
This is never an easy decision, and
there is no single right answer
...
There are a couple of different cheat sheets available online which have
a flowchart that helps you decide the right algorithm based on the type of classification or
regression problem you are trying to solve
...
Once you‘ve trained your model, it is critical that you evaluate its success
...
It involves
separating the dataset into k equally sized groups of instances, training on all the groups
except one, and repeating the process with different groups left out
...
For classification models, we often test accuracy using PCC (per cent correct classification),
along with a confusion matrix which breaks down the errors into false positives and false
negatives
...
For a regression model, the
common metrics include the coefficient of determination (which gives information about the
goodness of fit of a model), mean squared error (MSE), and average absolute error
...
Data Visualization:
Data visualization is a tricky field, mostly because it seems simple but it could possibly be
one of the hardest things to do well
...
Once you‘ve derived the intended
insights from your model, you have to represent them in way that the different key
stakeholders in the project can understand
...
Business Understanding:
Now that you‘ve gone through the entire lifecycle, it‘s time to go back to the drawing board
...
This is where you evaluate how the
success of your model relates to your original business understanding
...
WHY DOES DATA SCIENCE USE PYTHON?:Python is one of the simplest programming languages
...
Python have many high-quality libs that helping in data-science tasks
...
You may write in one line what in other languages can take up to
100 lines
...
It‘s the best compromise between scale and sophistication (in terms of data processing)
...
Python is powerful language
...
There are plenty of Python scientific packages for data visualization, machine learning,
natural language processing, complex data analysis and more
...
The most popular libraries and tools for data science are:
Pandas: a library for data manipulation and analysis
...
NumPy: the fundamental package for scientific computing with Python, adding support for
large, multi-dimensional arrays and matrices, along with a large library of high-level
mathematical functions to operate on these arrays
...
If you want to get more information and to know more libraries and tools, you can check this
article the most popular Python scientific libraries
...
The ability of computers to learn from examples instead of operating
strictly according to previously written rules is an exciting way of solving problems
...
You can have a
machine learning solution running in no time! If you want to start with ML in Python I totally
recommend you to read an article
The main strength of Python (C Python to be specific) for Data Science is the availability of
one key library, namely Numpy, that enables leveraging SIMD (same instruction multiple
data) vectorization
...
Numpy (and then in turn Scipy and the sci-kits) made it easy to build efficient matrix-based
computations while using a high-level general-purpose language (unlike Mat lab), which is
really important for the adoption by the scientists and applied mathematicians
...
As a consequence, the scientific
eco-system around C Python is really vivid so there are many libraries ready to use and a lot
of know-how available to teach other people how to use and build them
...
g
...
-Interfacing a JVM with native code that can in turn be vectorized is pretty hard (the JNI is
notoriously difficult and error prone) and has performance trade-offs that come with the cost
of moving data in and out of the JVM
...
This year, artificial intelligence will move from novelty to need as
companies integrate it directly into platforms, automating data analysis in a new way
...
While those insights certainly inform better decision
making, they involve some inefficiency; reports must be generated and insights must be
analysed and discussed before decisions can be made
...
By embedding intelligence
into business processes, optimizations can be made automatically, drastically increasing
efficiency
...
Quantitative funds in Financial services are driven by machine learning,
health care is employing data science in genomics and Amazon‘s Alexia as well as other
consumer products are using artificial intelligence to sell more and create personal
experiences
...
-Precise and efficient syntax
...
-Built-in modules providing standardized solutions
...
-Rapid development
...
-Libraries are platform independent
...
Title: DATA SCIENCE
Description: It consists of various planning model tools,uses of data science,life cycle of data science,which tool plays important role in data science and y? Skills required to become a data scientist and importance of python in data science etc.
Description: It consists of various planning model tools,uses of data science,life cycle of data science,which tool plays important role in data science and y? Skills required to become a data scientist and importance of python in data science etc.