The purpose of this app is to explore modeling of text data, in this case whether news headlines are real or clickbait. It lets the user fit models to
classify articles into clickbait and not clickbait based on the headline text.
The data are preloaded from the textclassificationexamples
package, which can be
installed using remotes::install_github('leahannejohnson/textclassificationexamples')
,
and a number of features are created by applying helper functions from the same to the rows of the
data frame with the help of dplyr::mutate()
. These functions include:
-
has_common_phrase()
: Takes a character string and returns a logical - TRUE if the string contains a common phrase, and FALSE if it does not. -
has_exaggerated_phrase()
: Takes a character string and returns a logical - TRUE if the string contains an exaggerated phrase, and FALSE if it does not. -
num_contractions()
: Takes a character string and returns an integer - the number of contractions contained in the string. -
num_stop_words()
: Takes a character string and returns an integer - the number of stop words contained in the string. -
num_pronouns()
: Takes a character string and returns an integer - the number of pronouns contained in the string. -
starts_with_num()
: Takes a character string and returns a logical - TRUE if the string begins with a number, and FALSE if it does not. -
has_question_word()
: Takes a character string and returns a logical - TRUE if the string contains a question word, and FALSE if it does not. -
positivity()
: Takes a character string and returns the sum of the AFINN positivity scores of the words in the string.
Authors: Leah Johnson, Nicholas Horton
Last modified, January 21, 2022