The jupyter notebook for this assignment may be found here.

computer science

Description

The jupyter notebook for this assignment may be found here. In this project you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is complete you should be able to estimate the sentiment of tweets filtered by content. There are 4 files provided here: http://www.cs.columbia.edu/~cannon/tweet_data/ (http://www.cs.columbia.edu/~cannon/tweet_data/) 1. all_tweets.txt is the large collection of tweets 2. some_tweets.txt is a subset of all_tweets that's more manageable to prototype on 3. sentiments.csv a csv with word sentiment values 4. zips.csv (not required, see below) We will go over the format of each of these files in class. Tweets: We can represent a single tweet using a Python dictionary with the following entries: text: a string, the text of the tweet all in lowercase time: a datetime object, date and time of the tweet latitude: a float, the latitude of the tweet's location longitude: a float, the longitude of the tweet's location Problem 1a Create a list of dictionaries from the data in some_tweets.txt where each dictionary corresoponds to a single tweet. If you change the format of the some_tweets file you should include your altered version with your submission. In [ ]: #your code here Problem 1b Create a single DataFrame from the list of tweets. In [ ]: #your code here Problem 2 Write a function add_sentiment that adds a sentiment column to the DataFrame from 1b. Determine the sentiment of each tweet by taking the average sentiment over all of the words in the tweet. Use the sentiment values (between -1 and 1) in the sentiments.csv file to get the value of a word's sentiment. Note: words without a sentiment do not have sentiment 0, they have no sentiment at all and should therefore not contribute to the average. Your function should take as input a DataFrame of tweets together with the name of the sentiment file. Note that your function will be altering the DataFrame. This is a side effect. It's okay to do it this time. In [ ]: def add_sentiment(tweets,filename): #your code here Problem 3 Write a function called tweet_filter that will return a new DataFrame of tweets filtered by the content of the tweet text. The input for this function should be a DataFrame of tweets and a list of words (strings). The function should return a DataFrame of tweets that each include all of the words in the word list ignoring case and punctuation.


Related Questions in computer science category