# Keyword Extractor

A simple [NPM package](https://npmjs.org/package/keyword-extractor) for extracting _keywords_ from a string by
removing stopwords.

## Installation

```sh
$ npm install keyword-extractor
```
### Install browserify, required for demo (modify sample.js file and browserify after that)
```sh
$ npm install browserify
$ cd keyword-extractor/demo
$ browserify sample.js -o bundle.js
```
> [use online documentation for detalied usage usage](http://browserify.org/)

## Running tests

To run the test suite, first install the development dependencies by running the following command within the package's
directory.

```sh
$ npm install
```

To execute the package's tests, run:

``` sh
$ make test
```

## Usage of the Module

```javascript
//  include the Keyword Extractor
var keyword_extractor = require("keyword-extractor");

//  Opening sentence to NY Times Article at
//  http://www.nytimes.com/2013/09/10/world/middleeast/surprise-russian-proposal-catches-obama-between-putin-and-house-republicans.html
var sentence = "President Obama woke up Monday facing a Congressional defeat that many in both parties believed could hobble his presidency."

//  Extract the keywords
var extraction_result = keyword_extractor.extract(sentence,{
                                                                language:"english",
                                                                remove_digits: true,
                                                                return_changed_case:true,
                                                                remove_duplicates: false

                                                           });

/*
  extraction result is:

  [
        "president",
        "obama",
        "woke",
        "monday",
        "facing",
        "congressional",
        "defeat",
        "parties",
        "believed",
        "hobble",
        "presidency"
    ]
*/
```

### Options Parameters

The second argument of the _extract_ method is an Object of configuration/processing settings for the extraction.

Parameter Name | Description | Permitted Values
---------------|-------------|-----------------
language       | The stopwords list to use. | _english_, _spanish_, _polish_, _german_, _french_, _italian_, _dutch_, _romanian_, _russian_, _portuguese_, _swedish_,
remove_digits | Removes all digits from the results if set to true | _true_ or _false_
return_changed_case | The case of the extracted keywords. Setting the value to _true_ will return the results all lower-cased, if _false_ the results will be in the original case. | _true_ or _false_
return_chained_words | Instead of returning each word separately, join the words that were originally together. Setting the value to _true_ will join the words, if _false_ the results will be splitted on each array element. | _true_ or _false_
remove_duplicates | Removes the duplicate keywords | _true_ , _false_ (defaults to _false_ )
return_max_ngrams | Returns keywords that are ngrams with size 0-_integer_  | _integer_ , _false_ (defaults to _false_ )


## Credits

The initial stopwords lists are taken from the following sources:

- English [http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop]
- Spanish [https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-spanish.txt]