igraph::pageranksmart_stopwords to be internal data so that
package doesnt need to be explicitly loaded with library to
be able to parseidf(d, t) = log( n / df(d, t) ) to
idf(d, t) = log( n / df(d, t) ) + 1 to avoid zeroing out
common word tfidf valueslexRank and
unnest_sentencesunnest_sentences and
unnest_sentences_ to parse sentences in a dataframe
following tidy data principlesbind_lexrank and
bind_lexrank_ to calculate lexrank scores for sentences in
a dataframe following tidy data principles
(unnest_sentences & bind_lexrank can be
used on a df in a magrittr pipeline)sentenceSimil now calculated
using Rcpp. Improves speed by ~25%-30% over old implementation using
proxy packageAdded logic to avoid naming conflicts in proxy::pr_DB in
sentenceSimil (#1, @AdamSpannbauer)
Added check and error for cases where no sentences above
threshold in lexRankFromSimil (#2, @AdamSpannbauer)
tokenize now has stricter punctuation removal.
Removes all non-alphnumeric characters as opposed to removing
[:punct:]