Skip to contents

When working with Tweets, counting how many times each Emoji appears in the entire Tweet corpus is useful. This is when top_n_emojis comes into play, and it is handy to see how Emojis are distributed across the corpus. If a Tweet has 10 Emojis, top_n_emojis will count it 10 times and assign each of the 10 Emojis on its respective Emoji category. What is interesting to note is Unicodes returned by top_n_emojis could have duplicates, meaning some Unicodes share various Emoji names. By default, this does not happen, but users can choose duplicated_unicode = 'yes' to obtain duplicated Unicodes.

Usage

top_n_emojis(tweet_tbl, tweet_text, n = 20, duplicated_unicode = "no")

Arguments

tweet_tbl

A dataframe/tibble containing tweets/text.

tweet_text

The tweet/text column.

n

Top n Emojis, default is 20.

duplicated_unicode

If no repetitious Unicode, no. Otherwise, yes. Default is no.

Value

A tibble with top n Emojis

Examples

library(dplyr)
data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603",
                      "R is my language! \U0001f601\U0001f606\U0001f605",
                      "This Tweet does not have Emoji!",
                      "Wearing a mask\U0001f637\U0001f637\U0001f637.",
                      "Emoji does not appear in all Tweets",
                      "A flag \U0001f600\U0001f3c1")) %>%
         top_n_emojis(tweets, n = 2)
#> # A tibble: 2 × 4
#>   emoji_name             unicode emoji_category        n
#>   <chr>                  <chr>   <chr>             <int>
#> 1 face_with_medical_mask 😷      Smileys & Emotion     3
#> 2 grinning               😀      Smileys & Emotion     2