![]() My_stopwords = stopwords.words('english') My_punctuations = list(string.punctuation) Organic_words.extend(tknzr.tokenize(re.sub(r'#', ' ',row.lower()))) Tknzr = TweetTokenizer(strip_handles=True, reduce_len=True) Now we are going to make a histogram of the most popular terms in the text of WWDC tweets. There is a lot of marketing and product placement. As you may have originally guessed, the majority of tweets with #wwdc are not individuals really tweeting their own thoughts about the conference. Thus, we see that only 65,018 of our starting 161,813 tweets are original tweets. Organics = organics.isin(organic_sources)] Organics = ('Freshman', case=False)=False ] Organics = ('Iniesta', case=False)=False ] Organics = ('#gettingtoknowReedfans', case=False)=False ] Organics = ('#mondaymotivation', case=False)=False ] Organics = ('#Xbo圎3', case=False)=False ] #Delete tweets from organics = ('LinkedIn', case=False)=False ] # In case RT was placed further in the text than the beginning. Uniques = df.drop_duplicates(inplace=False, subset='text') #Let's remove retweets, remove inorganic sources, and retain only text-unique tweets. That is, we will delete retweets, tweets that are not from genuine sources, and tweets that are only designed to get more followers on twitter. We are now going to get rid of many tweets. 'Twitterrific','Echofon','OS X','Twitter for Windows Phone','','Google' 'Tweetbot for iΟS','Twitter for Mac','Twitter for iPad','twitterfeed','Twitter for Windows' Organic_sources = ['Twitter for iPhone', 'Twitter Web Client','Twitter for Android','TweetDeck','Tweetbot for Mac' #Let's create a list of organic or genuine sources ![]() #We will want to take these out before doing our analysisĭf.groupby('source').size().sort_values(ascending=False) #Also, twitterfeed and are social media platforms for deploying mass tweets. The point is that many of sources (like ShootingStarPorn) are not really genuine tweets. were the tweets made on an iPhone, iPad, or some mass marketing product? We will now analyze the sources of tweets. He is a prominent figure in Apple's public presentations. Schiller is the senior vice president of worldwide marketing at Apple. That is, we start with 161813 tweets, and you can see that for each tweet, we have recorded the time it was created, text, screen_name, source, etc.Īpparently this is the most retweeted tweet during WWDC. Our first goal will be to separate genuine tweets from mass marketing tweets and determine which terms are most popular in these genuine tweets. We are going to do a lot of data cleaning in this first analysis.
0 Comments
Leave a Reply. |