| as.text.table | Convert a data.table column of character vectors into a column with one row per word grouped by a grouping column. Optionally will split a column of strings into vectors of constituents. |
| flag_words | Flag rows in a text.table with specific words |
| label_parts_of_speech | Add a column with the parts of speech for each word in a text.table |
| l_pos | Parts of speech for English words from the Moby Project. |
| ngrams | Create n-grams |
| pos | Parts of speech for English words from the Moby Project. |
| regex_paragraph | Regular expression that might be used to split strings of text into component paragraphs. |
| regex_sentence | Regular expression that might be used to split strings of text into component sentences. |
| regex_word | Regular expression that might be used to split strings of text into component words. |
| rm_frequent_words | Delete rows in a text.table where the number of identical records within a group is more than a certain threshold |
| rm_infrequent_words | Delete rows in a text.table where the number of identical records within a group is less than a certain threshold |
| rm_long_words | Delete rows in a text.table where the word has more than a minimum number of characters |
| rm_no_overlap | Delete rows in a text.table where the records within a group are not also found in other groups (overlapping records) |
| rm_overlap | Delete rows in a text.table where the records within a group are also found in other groups (overlapping records) |
| rm_parts_of_speech | Delete rows in a text.table where the word has a certain part of speech |
| rm_regexp_match | Delete rows in a text.table where the record has a certain pattern indicated by a regular expression |
| rm_short_words | Delete rows in a text.table where the word has less than a minimum number of characters |
| rm_words | Remove rows from a text.table with specific words |
| sampleStr | Generates (pseudo)random strings of the specified char length |
| stopwords | Vector of lowercase English stop words. |
| str_any_match | Detect if there are any words in a vector also found in another vector. |
| str_counts | Create a list of a vector of unique words found in x and a vector of the counts of each word in x. |
| str_count_intersect | Count the intersecting words in a vector that are found in another vector (only counts unique words). |
| str_count_jaccard_similarity | Calculates the intersect divided by union of two vectors of words. |
| str_count_match | Count the words in a vector that are found in another vector. |
| str_count_nomatch | Count the words in a vector that are not found in another vector. |
| str_count_positional_match | Count words from a vector that are found in the same position in another vector. |
| str_count_positional_nomatch | Count words from a vector that are not found in the same position in another vector. |
| str_count_setdiff | Count the words in a vector that don't intersect with another vector (only counts unique words). |
| str_dt_col_combine | Combine columns of a data.table into a list in a new column, wraps list(unlist(c(...))) |
| str_extract_match | Extract words from a vector that are found in another vector. |
| str_extract_nomatch | Extract words from a vector that are not found in another vector. |
| str_extract_positional_match | Extract words from a vector that are found in the same position in another vector. |
| str_extract_positional_nomatch | Extract words from a vector that are not found in the same position in another vector. |
| str_rm_blank_space | Remove and replace excess white space from strings. |
| str_rm_long_words | Remove words from a vector that have more than a maximum number of characters. |
| str_rm_non_alphanumeric | Remove and replace non-alphanumeric characters from strings. |
| str_rm_non_printable | Remove and replace non-printable characters from strings. |
| str_rm_numbers | Remove and replace numbers from strings. |
| str_rm_punctuation | Remove and replace punctuation from strings. |
| str_rm_regexp_match | Remove words from a vector that match a regular expression. |
| str_rm_short_words | Remove words from a vector that don't have a minimum number of characters. |
| str_rm_words | Remove words from a vector of words found in another vector of words. |
| str_rm_words_by_length | Remove words from a vector based on the number of characters in each word. |
| str_stopwords_by_part_of_speech | Create a vector of English words associated with particular parts of speech. |
| str_tolower | Calls base::tolower(), which converts letters to lowercase. Only included to point out that base::tolower exists and should be used directly. |
| str_weighted_count_match | Weighted count of the words in a vector that are found in another vector. |