# enable high-res images in notebook 
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
conda install -c conda-forge pymongo
conda install -c conda-forge dnspythonpymongo library — interact with MongoDB databases from Pythondnspython library — used as part of connecting to a MongoDB Atlas Clusterkeys.py must contain keys.py file as mongo_connection_string’s value."<PASSWORD>" in the connection string with your password, and replace the database name "test" with "senators", which will be the database name in this example.import tweepy, keys
auth = tweepy.OAuthHandler(
    keys.consumer_key, keys.consumer_secret)
auth.set_access_token(keys.access_token, 
    keys.access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, 
                 wait_on_rate_limit_notify=True)               
senators.csv (provided in notebook's folder) contains each senator's import pandas as pd
senators_df = pd.read_csv('senators.csv')
senators_df['TwitterID'] = senators_df['TwitterID'].astype(str)
senators_df.head()
MongoClient¶from pymongo import MongoClient
atlas_client = MongoClient(keys.mongo_connection_string)
pymongo Database object representing the senators databasedb = atlas_client.senators 
TweetListener uses the db object representing the senators database to store tweets from tweetlistener import TweetListener
tweet_limit = 10000  
twitter_stream = tweepy.Stream(api.auth, 
    TweetListener(api, db, tweet_limit)) 
track senators’ Twitter handles as keywordsfollow their IDs  twitter_stream.filter(track=senators_df.TwitterHandle.tolist(),
    follow=senators_df.TwitterID.tolist())
TweetListener¶TweetListener from the “Data Mining Twitter” chapter.# tweetlistener.py
"""TweetListener downloads tweets and stores them in MongoDB."""
import json
import tweepy
from IPython.display import clear_output
class TweetListener(tweepy.StreamListener):
    """Handles incoming Tweet stream."""
def __init__(self, api, database, limit=10000):
        """Create instance variables for tracking number of tweets."""
        self.db = database
        self.tweet_count = 0
        self.TWEET_LIMIT = limit  # 10,000 by default
        super().__init__(api)  # call superclass's init
    def on_connect(self):
        """Called when your connection attempt is successful, enabling 
        you to perform appropriate application tasks at that point."""
        print('Successfully connected to Twitter\n')
def on_data(self, data):
        """Called when Twitter pushes a new tweet to you."""
        self.tweet_count += 1  # track number of tweets processed
        json_data = json.loads(data)  # convert string to JSON
        self.db.tweets.insert_one(json_data)  # store in tweets collection
        clear_output()  # ADDED: show one tweet at a time in Jupyter Notebook
        print(f'    Screen name: {json_data["user"]["name"]}') 
        print(f'     Created at: {json_data["created_at"]}')         
        print(f'Tweets received: {self.tweet_count}')         
        # if TWEET_LIMIT is reached, return False to terminate streaming
        return self.tweet_count < self.TWEET_LIMIT
    def on_error(self, status):
        print(status)
        return True
TweetListener (cont.)¶TweetListener overrode method on_status to receive Tweepy Status objects representing tweetson_data method instead Status objects, on_data receives each tweet object’s raw JSONon_data into a Python JSON objectCollections of documentsDatabase object db’s tweets Collection, creating it if it does not already existself.db.tweets
tweets Collection’s insert_one method to store the JSON object in the tweets collection'text')db.tweets.create_index([('$**', 'text')])
tweets Collection’s count_documents method and full-text search to count the total number of documents in the collection that contain the specified textsenators_df.TwitterHandle column{"$text": {"$search": senator}} indicates that we’re using the text index to search for the value of senatortweet_counts = []
for senator in senators_df.TwitterHandle:
    tweet_counts.append(db.tweets.count_documents(
        {"$text": {"$search": senator}}))
DataFrame senators_df adding a new column of tweet_counts tweet_counts_df = senators_df.assign(Tweets=tweet_counts)  
tweet_counts_df.sort_values(by='Tweets', ascending=False).head(10)
state_codes.py contains a dictionary that maps two-letter state codes to full state namesgeopy to look up the location of each statefrom geopy import OpenMapQuest
import time
from state_codes import state_codes
geocoder object to translate location names into Location objectsgeo = OpenMapQuest(api_key=keys.mapquest_key) 
states = tweet_counts_df.State.unique()  # get unique state names
states.sort() 
geocode with state name followed by ', USA' locations = []
from IPython.display import clear_output
for state in states:
    processed = False
    delay = .1 
    while not processed:
        try: 
            locations.append(geo.geocode(state_codes[state] + ', USA'))
            clear_output()  # clear cell's current output before showing next one
            print(locations[-1])  
            processed = True
        except:  # timed out, so wait before trying again
            print('OpenMapQuest service timed out. Waiting.')
            time.sleep(delay)
            delay += .1
DataFrame method groupby to group the senators by state as_index=False—state codes should be a column in returned GroupBy object, rather than indices for the object's rowsGroupBy object's sum method totals the numeric data by statetweets_counts_by_state = tweet_counts_df.groupby(
    'State', as_index=False).sum()
tweets_counts_by_state.head()
import folium
usmap = folium.Map(location=[39.8283, -98.5795], 
                   zoom_start=4, detect_retina=True,
                   tiles='Stamen Toner')
choropleth = folium.Choropleth(
    geo_data='us-states.json',
    name='choropleth',
    data=tweets_counts_by_state,
    columns=['State', 'Tweets'],
    key_on='feature.id',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Tweets by State'
).add_to(usmap)
layer = folium.LayerControl().add_to(usmap)
geo_data='us-states.json'—This is the file containing the GeoJSON that specifies the shapes to color.name='choropleth'—Folium displays the Choropleth as a layer over the map. This is the name for that layer that will appear in the map’s layer controls, which enable you to hide and show the layers. These controls appear when you click the layers icon () on the mapdata=tweets_counts_by_state—This is a pandas DataFrame (or Series) containing the values that determine the Choropleth colorscolumns=['State', 'Tweets']—When the data is a DataFrame, this is a list of two columns representing the keys and the corresponding values used to color the Choropleth. key_on='feature.id'—This is a variable in the GeoJSON file to which the Choropleth binds the values in the columns argumentfill_color='YlOrRd'—This is a color map specifying the colors to use to fill in the states. Folium provides 12 colormaps: 'BuGn', 'BuPu', 'GnBu', 'OrRd', 'PuBu', 'PuBuGn', 'PuRd', 'RdPu', 'YlGn', 'YlGnBu', 'YlOrBr' and 'YlOrRd'. You should experiment with these to find the most effective and eye-pleasing ones for your application(s).fill_opacity=0.7—A value from 0.0 (transparent) to 1.0 (opaque) specifying the transparency of the fill colors displayed in the states.line_opacity=0.2—A value from 0.0 (transparent) to 1.0 (opaque) specifying the transparency of lines used to delineate the states.legend_name='Tweets by State'—At the top of the map, the Choropleth displays a color bar (the legend) indicating the value range represented by the colors. This legend_name text appears below the color bar to indicate what the colors represent.Choropleth keyword argumentsgroupby maintains original row order in each groupindex — used to look up each state’s location in locations listgroup — collection of a state's two senatorssorted_df = tweet_counts_df.sort_values(by='Tweets', ascending=False)
for index, (name, group) in enumerate(sorted_df.groupby('State')):
    strings = [state_codes[name]]  # used to assemble popup text
    for s in group.itertuples():
        strings.append(f'{s.Name} ({s.Party}); Tweets: {s.Tweets}')
        
    text = '<br>'.join(strings)  
    popup = folium.Popup(text, max_width=200)
    marker = folium.Marker(
        (locations[index].latitude, locations[index].longitude), 
        popup=popup)
    marker.add_to(usmap) 
usmap object in a code cell to display the map in a notebook.usmap.save('SenatorsTweets.html')
from IPython.display import IFrame
IFrame(src="./SenatorsTweets.html", width=800, height=450)
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.