Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: PORTER STEMMER
Description: In these notes I briefly explain "PORTER STEMMER & its rules" the main part of Natural Language Processing. I explain its rules with examples.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Introduction to Natural Language Processing
Porter Stemmer and its Rules
Q no
...
Its main use is as part
of a term normalization process that is usually done when setting up Information Retrieval systems


It is more efficient not to use a dictionary (don’t have to maintain it if things change)
...




Porter Stemmers use simple algorithms to determine which affixes to strip in which order
and when to apply repair strategies
...

*v* - the stem contains a vowel
...
g
...

*o - the stem ends cvc, where the second c is not W, X or Y (e
...
-WIL, -HOP)

In a set of rules written below each other, only one is obeyed
...
2 Corpora and Role of Corpora in NLP
Ans: A collection of written text or recorded Speech
...

A corpus is a collection of text or speech material that has been brought together according to be
a certain set of predetermined criteria
...

Performing these counts enables the researcher not only to search for and spot key words and
phrases, but also to examine their concordances (i
...
the words that occur around them)
...


o More annotated text often yield more effective patterns
o Different genres may have different properties


Systems can “train” separately on different genres



Systems can “train” on one diverse corpus

Other analytic techniques, such as collocation analysis, enable the researcher to identify and extract
terms within a corpus that are associated (or, in other words, that collocate) with any other
particular word
...

Other commonly used techniques include:


Part-of-speech annotation: grammatical labelling of the words in a corpus;



Semantic tagging: automatic grouping of words into categories based on meaning;



Named-entity recognition: the process of automatically locating, classifying and
annotating named elements, such as people, organizations or places, in running texts
...
But to
date, relatively little work has sought to add a spatial dimension to corpus analysis despite the clear
coherence of the corpus-based approach with the ideas underlying the field of Geographical
Information Systems (GIS)
Title: PORTER STEMMER
Description: In these notes I briefly explain "PORTER STEMMER & its rules" the main part of Natural Language Processing. I explain its rules with examples.