Skip to content

NicolasKieffer/tdm-teeft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tdm-teeft

tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.

Installation

Using npm :

$ npm i -g tdm-teeft
$ npm i --save tdm-teeft

Using Node :

/* require of Teeft module */
const Teeft = require('tdm-teeft');

/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();

/* Build new Instance of Filter */
let filter = new Teeft.Filter();

/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();

/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();

Launch tests

$ npm run test

Build documentation

$ npm run docs

API Documentation

Classes

Filter
Indexator
Tagger
TermExtractor

Filter

Kind: global class

new Filter([options])

Returns: Filter - - An instance of Filter

Param Type Description
[options] Object Options of constructor
[options.minOccur] Number Number of minimal occurence
[options.noLimitStrength] Number Strength limit
[options.lengthSteps] Number Steps length

Example (Example usage of 'contructor' (with paramters))

let options = {
  // Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
  'lengthSteps': {
    'values': [ // store intermediate steps here,
      { // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
        'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 4
      },
      { // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
        'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 5
      }
    ],
    'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
      'lim': 1000,
      'value': 1
    },
    'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
      'lim': 6000,
      'value': 7
    }
  },
  'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
  'noLimitStrength': 2 //
  },
  defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

Example (Example usage of 'contructor' (with default values))

let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

filter.call(occur, strength) ⇒ Boolean

Check values depending of filter conditions

Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected

Param Type Description
occur Number Occurence value
strength Number Strength value

Example (Example usage of 'call' function)

let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false

filter.configure(length) ⇒ Number

Configure the filter depending of lengthSteps

Kind: instance method of Filter
Returns: Number - Return configured minOccur value

Param Type Description
length Number Text length

Example (Example usage of 'configure' function)

let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null

Indexator

Kind: global class

new Indexator([options])

Returns: Indexator - - An instance of Indexator

Param Type Description
[options] Object Options of constructor
[options.filter] Filter Options given to extractor of this instance of Indexator
[options.lexicon] Object Lexicon used by tagger of this instance of Indexator
[options.stopwords] Object Stopwords used by this instance of Indexator
[options.lemmatizer] Object Lemmatizer used by tagger of this instance of Indexator
[options.stemmer] Object Stemmer used by this instance of Indexator
[options.dictionary] Object Dictionnary used by this instance of Indexator

Example (Example usage of 'contructor' (with paramters))

let options = {
    'filter': customFilter // According customFilter contain your custom settings
  },
  indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter

Example (Example usage of 'contructor' (with default values))

let indexator = new Indexator();
// returns an instance of Indexator with default options

indexator.tokenize(text) ⇒ Array

Extract token from a text

Kind: instance method of Indexator
Returns: Array - Array of tokens

Param Type Description
text String Fulltext

Example (Example usage of 'tokenize' function)

let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']

indexator.translateTag(tag) ⇒ String

Translate the tag of Tagger to Lemmatizer

Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)

Param Type Description
tag String Tag given by Tagger

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';

indexator.sanitize(terms) ⇒ Array

Sanitize list of terms (with some filter)

Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms

Param Type Description
terms Array List of terms

Example (Example usage of 'sanitize' function)

let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.lemmatize(terms) ⇒ Array

Lemmatize a list of tagged terms (add a property lemma & stem)

Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma

Param Type Description
terms Array List of tagged terms

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.index(data) ⇒ Object

Index a fulltext

Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)

Param Type Description
data String Fulltext who need to be indexed

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation

Indexator.compare(a, b) ⇒ Number

Compare the specificity of two objects between them

Kind: static method of Indexator
Returns: Number - -1, 1, or 0

Param Type Description
a Object First object
b Object Second object

Example (Example usage of 'compare' function)

Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1

Tagger

Kind: global class

new Tagger([options])

Returns: Tagger - - An instance of Tagger

Param Type Description
[options] Object Options of constructor

Example (Example usage of 'contructor' (with paramters))

let lexicon = { ... },
  tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion

Example (Example usage of 'contructor' (with default values))

let tagger = new Tagger();
// returns an instance of Tagger with default lexion

tagger.tag(terms) ⇒ Array

Tag terms

Kind: instance method of Tagger
Returns: Array - List of tagged terms

Param Type Description
terms Array List of terms

Example (Example usage of 'tag' function)

let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]

TermExtractor

Kind: global class

new TermExtractor([options])

Returns: TermExtractor - - An instance of TermExtractor

Param Type Description
[options] Object Options of constructor
[options.tagger] Tagger An instance of Tagger
[options.filter] Filter An instance of Filter

Example (Example usage of 'contructor' (with paramters))

let myTagger = new Tagger(), // According myTagger contain your custom settings
  myFilter = new Filter(), // According myFilter contain your custom settings
  termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options

Example (Example usage of 'contructor' (with default values))

let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options

termExtractor.extract(taggedTerms) ⇒ Object

Extract temrs

Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms

Param Type Description
taggedTerms Array List of tagged terms

Example (Example usage of 'extract' function)

let termExtractor = new TermExtractor(),
  myDefaultTagger = new Tagger(),
  taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };

termExtractor._startsWith(str, prefix) ⇒ Boolean

Check if prefix of given string match with given prefix

Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false

Param Type Description
str String String where the prefix will be searched
prefix String Prefix used for the research

About

Teeft tdm module

Resources

Stars

Watchers

Forks

Packages

No packages published