Exploring social media analytics with R

The original link to the ACIS 2018 tutorial


Resource:

  • [Packages to be used – please install]
> install.packages ("rtweet")
> 
> install.packages("ggplot2")
> install.packages("tidytext")
> install.packages("dplyr")
> install.packages("readr")
> install.packages("stringr")
> install.packages("tidyr")
> install.packages("scales")
> install.packages("wordcloud")
> install.packages("reshape2")
  • Reference material

Setting up the Twitter R package for text analytics

Setting API for rtweet


Connecting to Twitter from R

Before you start working with Twitter in R, you need to setup your access in the Twitter itself. Please see Setting up API in Twitter

Note down:

  • Your App Name
  • Your Consumer Key (API Key)
  • Consumer Secret (API Secret)

There are two main R packaages to work with Twitter:

In this tutorial we will be using rtweet

To use this package you need to install it in R first. You do it by typing the following line

> install.packages("rtweet")

We will also need several more packages to work with data:

> install.packages("ggplot2")
> install.packages("tidytext")
> install.packages("dplyr")
> install.packages("readr")
> install.packages("stringr")
> install.packages("tidyr")
> install.packages("scales")
> install.packages("wordcloud")
> install.packages("reshape2")

and then load these packages

> library (rtweet)
> library(ggplot2)
> library(tidytext)
> library (dplyr)
> library (readr)
> library(stringr)
> library(tidyr)
> library(scales)
> library(wordcloud)
> library(reshape2)

Now you are all ready to go!

To connect to Twitter you need to set up your access variables:

> # whatever name you assigned to your created app
> appname <- "MariaP"
> 
> ## api key (example below is not a real key)
> key <- "XXXXXXXXXX"
> 
> ## api secret (example below is not a real key)
> secret <- "YYYYYYYYYY"

Now let’s create a token and connect!

> twitter_token <- create_token(
>   app = appname,
>   consumer_key = key,
>   consumer_secret = secret)

create_token function sends a request to generate your access token. The technical part of this is explained here


We are all ready to go! Let’s search out tweets.

search_tweet function is fantastic. It allow you search hashtags and user timelines. It takes the folowing arguments

> SydneyTweets <- search_tweets(q = "Sydney", n = 1000, lang = "en", include_rts = FALSE)  
> MelbourneTweets <- search_tweets(q = "Melbourne", n = 1000, lang = "en", include_rts = FALSE)
  • q: query to be searched
  • n: number of tweets to return. The maximum is 18,000. But you will need to do this in “batches”, so need to use retryonratelimit argument
  • include_rts: if set to FALSE, retweets are excluded from the results

Let’s see how frequenty tweets appear

> ts_plot(SydneyTweets, by="days")

Frequency of tweets with #sydney

or if we want to be more fancy!

>   ts_plot(SydneyTweets, "mins") +
>     labs (
>       x="Date and time",
>       y="Frequency of tweets",
>       title="Time series of #Sydney tweets"
>     ) +
>     theme_dark()

Adding some elements to the graph

We can get tweets from a particular Twitter account using get_timeline

>   MelbourneCityTweets <- get_timeline("cityofmelbourne")
>   SydneyCityTweets <- get_timeline("cityofsydney")

Let’s visualise their frequencies

>   ts_plot(MelbourneCityTweets, "days")
>   ts_plot(MelbourneCityTweets, "hours")
>   ts_plot(SydneyCityTweets, "days")
>   ts_plot(SydneyCityTweets, "hours")

Frequencies of @CityOfMelbourne tweet, days

Frequencies of @CityOfMelbourne tweet, hours

Let’s merge two datasets for hashtag tweets and add a label ‘city’

>   tweets <- bind_rows(MelbourneTweets %>% 
>                         mutate(city = "Melbourne"),
>                       SydneyTweets %>% 
>                         mutate(city = "Sydney")) 

Let’s count how many times each user used the hashtag for the city

>   tweets<-tweets %>% 
+     add_count(user_id)
> kable(tweets[5:10, 3:6])
created_at screen_name text source
2018-12-02 22:48:14 SBEAustralia Interested in scaling your business as a #womenintech & #womeninSTEM? Don’t miss out on our upcoming event in #Melbourne on 11 Dec at #telstralabs with alum Anabela Correia, Kathy Harrison, @AlisonHardacre & @AyalaDomani of @Telstra RSVP now: https://t.co/Vo4MWPlyIB https://t.co/BbhGgoonxi Twitter Web Client
2018-12-02 22:48:12 beatthatflight Aussie flight deal: Possible business class error fare! Singapore to Sydney/Melbourne/Perth on Qantas from https://t.co/AWbqSIHKTN #travel Beat That Flight
2018-12-02 14:52:55 beatthatflight Aussie flight deal: Sydney/Melbourne to Honolulu, Hawaii from $495/ $511 Return on Jetstar https://t.co/yrnqzuKRS8 #travel Beat That Flight
2018-12-02 22:48:09 ibmlgbt IBM was honored to welcome Guide Dogs Victoria to our #AccessAbilityDay afternoon tea in Melbourne
#inclusiveIBM Twitter Web Client
2018-12-02 22:47:23 LukusWilson @itshannahbowman so you mean melbourne right? Twitter for iPhone
2018-12-02 22:47:09 ANZStadium PRAYERS answered: It’s @BonJovi week, Sydney! Get in the mood for Saturday by reading this review of their show in Melbourne on Saturday: https://t.co/YHUdxbUIhi #ThisHouseIsNotForSale https://t.co/VfmQFKbCvj Twitter Web Client

and let’s draw it

>   ggplot(tweets, aes(x = user_id, y=n, color= city)) + geom_point()

plot of chunk unnamed-chunk-21


You can also search Twitter user data using lookup_users:

> users <- c("cityofsydney", "cityofmelbourne")
> cityTweets <- lookup_users(users)
> kable(cityTweets[5:10, 3:6])
created_at screen_name text source
2018-12-03 00:39:26 cityofmelbourne This morning Cr @BeverleyPinder launched the 2018 Victorian Disability Sport and Recreation Festival at Southbank’s Crown Riverwalk. Head down to explore and experience accessible and inclusive sport. It’s on from 10am – 3pm. https://t.co/KHfYkwqXCH Twitter for iPhone
2018-12-02 20:25:07 cityofmelbourne Jane Harper dreamed up some of her first novel at City Library. Now she’s hit the big time.
https://t.co/v2jlr2VxWH https://t.co/T2TJWP86is Hootsuite Inc.
2018-12-01 03:28:15 cityofmelbourne Celebrating A Very Koorie Krismas @FedSquare https://t.co/jiVdIxnfe8 Twitter for iPhone
2018-12-01 01:40:06 cityofmelbourne Get a taste of Christmas, the Gingerbread Village has opened at Federation Square
https://t.co/wnNc0u30d1 https://t.co/oO2mJUx8Ga Hootsuite Inc.
2018-11-30 23:35:07 cityofmelbourne Sing and splash, bust a rhyme or kick back with cool tunes at our pools this summer. https://t.co/uJ434t96El https://t.co/z2COQsagXu Hootsuite Inc.
2018-11-30 10:41:32 cityofmelbourne Federation Square is LIT. Thanks to everyone who came and celebrated the lighting of the Christmas tree.

Make sure you check out our Christmas events. https://t.co/wnNc0ukB4z https://t.co/sHhZGrD6Ou |Twitter for iPhone |

Lookup friends

>   city_fds <- get_friends(users)

Lookup followers

>   city_flw <- get_followers("cityofsydney", n = 75000)

Lookup data on followers’ accounts

 >  city_flw_data <- lookup_users(city_flw$user_id)

Let’s have a look at Word frequencies in tweets, but first we need to clean them to remove unwanted characters that Twitter specific, we also remove stop words, punctuation etc.

> remove_reg <- "&amp;|&lt;|&gt;"
>   cityTweets_tidy <- cityTweets %>% 
+     filter(!str_detect(text, "^RT")) %>%
+     mutate(text = str_remove_all(text, remove_reg)) %>%
+     unnest_tokens(word, text, token = "tweets") %>%
+     filter(!word %in% stop_words$word,
+            !word %in% str_remove_all(stop_words$word, "'"),
+            str_detect(word, "[a-z]"))

Calculate frequencies of words for these two accounts

>   frequency <- cityTweets_tidy %>% 
+     group_by(city) %>% 
+     count(word, sort = TRUE) %>% 
+     left_join(cityTweets_tidy %>% 
+                 group_by(city) %>% 
+                 summarise(total = n())) %>%
+     mutate(freq = n/total) 
>   frequency
## # A tibble: 1,765 x 5
## # Groups:   city [2]
##    city      word          n total    freq
##    <chr>     <chr>     <int> <int>   <dbl>
##  1 Sydney    city         33  1475 0.0224 
##  2 Sydney    sydney       28  1475 0.0190 
##  3 Melbourne city         16  1198 0.0134 
##  4 Melbourne melbourne    13  1198 0.0109 
##  5 Sydney    ref          12  1475 0.00814
##  6 Melbourne christmas    11  1198 0.00918
##  7 Sydney    christmas    11  1475 0.00746
##  8 Sydney    nsw          11  1475 0.00746
##  9 Sydney    team         11  1475 0.00746
## 10 Sydney    transport    10  1475 0.00678
## # ... with 1,755 more rows

Let’s make the table more readable

>   frequency <- frequency %>% 
+     select(city, word, freq) %>% 
+     spread(city, freq) %>%
+     arrange(Melbourne, Sydney)
>   
>   frequency
## # A tibble: 1,593 x 3
##    word       Melbourne   Sydney
##    <chr>          <dbl>    <dbl>
##  1 access      0.000835 0.000678
##  2 activated   0.000835 0.000678
##  3 australian  0.000835 0.000678
##  4 awards      0.000835 0.000678
##  5 beautiful   0.000835 0.000678
##  6 beginning   0.000835 0.000678
##  7 begins      0.000835 0.000678
##  8 bins        0.000835 0.000678
##  9 care        0.000835 0.000678
## 10 catch       0.000835 0.000678
## # ... with 1,583 more rows

Let’s visualise frequencies

Words near the line are used with about equal frequencies by Melbourne and Sydney, while words far away from the line are used much more by one account compared to the other.

>   ggplot(frequency, aes(Melbourne, Sydney)) +
+     geom_jitter(alpha = 0.1, size = 2.5, width = 0.25, height = 0.25) +
+     geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
+     scale_x_log10(labels = percent_format()) +
+     scale_y_log10(labels = percent_format()) +
+     geom_abline(color = "red")

plot of chunk unnamed-chunk-31

Now let’s do wordclouds for a merged dataset

>   cityTweets_tidy %>% 
+     count(word, sort = TRUE)  %>%
+     with(wordcloud(word, n, max.words = 100))

plot of chunk unnamed-chunk-32

and for @cityofSydney

>   cityTweets_tidy %>% 
+     filter(city=="Sydney")%>% 
+     count(word, sort = TRUE)  %>%
+     with(wordcloud(word, n, max.words = 100))

plot of chunk unnamed-chunk-33

Let’s get the wordcloud for positive and negative words

>   cityTweets_tidy %>%
+     inner_join(get_sentiments("bing")) %>%
+     count(word, sentiment, sort = TRUE) %>%
+     acast(word ~ sentiment, value.var = "n", fill = 0) %>%
+     comparison.cloud(colors = c("gray20", "gray80"),
+                      max.words = 100)

plot of chunk unnamed-chunk-34

Leave a Reply

Your email address will not be published. Required fields are marked *