When working with Tweets, counting how many times each Emoji appears in the
entire Tweet corpus is useful. This is when top_n_emojis
comes into
play, and it is handy to see how Emojis are distributed across the corpus.
If a Tweet has 10 Emojis, top_n_emojis
will count it 10 times and
assign each of the 10 Emojis on its respective Emoji category. What is
interesting to note is Unicodes returned by top_n_emojis
could have
duplicates, meaning some Unicodes share various Emoji names. By default, this
does not happen, but users can choose duplicated_unicode = 'yes'
to
obtain duplicated Unicodes.
Arguments
- tweet_tbl
A dataframe/tibble containing tweets/text.
- tweet_text
The tweet/text column.
- n
Top
n
Emojis, default is 20.- duplicated_unicode
If no repetitious Unicode,
no
. Otherwise,yes
. Default isno
.
Examples
library(dplyr)
data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603",
"R is my language! \U0001f601\U0001f606\U0001f605",
"This Tweet does not have Emoji!",
"Wearing a mask\U0001f637\U0001f637\U0001f637.",
"Emoji does not appear in all Tweets",
"A flag \U0001f600\U0001f3c1")) %>%
top_n_emojis(tweets, n = 2)
#> # A tibble: 2 × 4
#> emoji_name unicode emoji_category n
#> <chr> <chr> <chr> <int>
#> 1 face_with_medical_mask 😷 Smileys & Emotion 3
#> 2 grinning 😀 Smileys & Emotion 2