Collecting / Storing Tweets with Python and MySQL

A few days ago, I published Collecting / Storing Tweets with Python and MongoDB. In that post, I describe the steps needed to collect and store tweets gathered via the Twitter Streaming API.

I received a comment on that post asking how to store data into MySQL instead of MongoDB. Here’s what you’d need to do to make that change.

Collecting / Storing Tweets with Python and MySQL

In the previous post, we store the twitter data into MongoDB with one line of code:

Unlike MongoDB where we can insert the entire json object, if we want to use MySQL instead of MongoDB, we need to do some additional work with the ‘datajson’ object before storing it.

Let’s assume that we are interested in just capturing the username, date, Tweet and Tweet ID from twitter.  This is most likely the bare minimum information you’d want to capture from the API…there are many (many) more fields available but for now, we’ll go with these four.

Note: I’ll hold off on the MySQL specific changes for now and touch on them shortly.

Once you capture the tweet (line 38 in my original script) and have it stored in your datajson object, create a few variables to store the date, username, Tweet and ID.

Note in the above that we are using the parser.parse() command from the dateutil  module to parse the created_at date for storage into Mysql.

Now that we have our variables ready to go, we’ll need to store those variables into MySQL.  Before we can do that, we need to set up a MySQL connection. I use the python-mysql connector but you are free to use what you need to.  You’ll need to do a import MySQLdb  to get the connector imported into your script (and assuming you installed the connector with pip install mysql-python.

You’ll need to create a table to store this data. You can use the sql statement below to do so if you need assistance / guidance.

Now, let’s set up our MySQL connection, query and execute/commit for the script. I’ll use a function for this to be able to re-use it for each tweet captured.

That’s it.  You are now collecting tweets and storing those tweets into a MySQL database.

Full Script


MySQL for PythonWant more information on working with MySQL and Python? Check out the book titled MySQL for Python by Albert Lukaszewski.


Jupyter with Vagrant

I’ve written about using vagrant for 99.9% of my python work on here before (see here and here for examples).   In addition to vagrant, I use jupyter notebooks on 99.9% of the work that I do, so I figured I’d spend a little time describing how I use jupyter with vagrant.

First off, you’ll need to have vagrant set up and running (descriptions for linux, MacOS, Windows).   Once you have vagrant installed, we need to make a few changes to the VagrantFile to allow port forwarding from the vagrant virtual machine to the browser on your computer. If you followed the Vagrant on Windows post, you’ll have already set up the configuration that you need for vagrant to forward the necessary port for jupyter.   For those that haven’t read that post, below are the tweaks you need to make.

My default VagrantFile is shown in figure 1 below.

VagrantFile Example
Figure 1: VagrantFile Example

You’ll only need to change 1 line to get port forwarding working.   You’ll need to change the line that reads:

to the following:

This line will forward port 8888 on the guest to port 8888 on the host. If you aren’t using the default port of 8888 for jupyter, you’ll need to change ‘8888’ to the port you wish to use.

Now that the VagrantFile is ready to go, do a quick ‘vagrant up’ and ‘vagrant ssh’ to start your vagrant VM and log into it. Next, set up any virtual environments that you want / need (I use virtualenv to set up a virtual environment for every project).  You can skip this step if you wish, but it is recommended.

If you set up a virtual environment, go ahead and source into it so that you are using a clean environment and then run the command below to install jupyter. If you didn’t go then you can just run the below to install jupyter.

You are all set.  Jupyter should be installed and ready to go. To run it so it is accessible from your browser, just run the following command:

This command tells jupyter to listen on any IP address.

In your browser,  you should be able to visit your new fangled jupyter (via vagrant) instance by visiting the following url:

Now you’re ready to go with jupyter with vagrant.

Note: If you are wanting / needing to learn Jupyter, I highly recommend Learning IPython for Interactive Computing and Data Visualization (amazon affiliate link). I recommend it to all my clients who are just getting started with jupyter and ipython.



Stockstats – Python module for various stock market indicators

I’m always working with stock market data and stock market indicators. During this work, there’s times that I need to calculate things like Relative Strength Index (RSI), Average True Range (ATR), Commodity Channel Index (CCI) and other various indicators and stats.

My go-to for this type of work is TA-Lib and the python wrapper for TA-Lib but there’s times when I can’t install and configure TA-Lib on a computer. When this occurs, I then have to go find the various algorithms to calculate the various indicators / stats that I need.  When this happens, I usually end up making mistakes and/or taking longer than I really should to get these algo’s built to use in a project.  Of course I re-use what I can when I can but many times I’ve forgotten that I built an RSI function in the past and recreate.

I found myself in this situation today. I need an RSI calculation for some work I’m doing.  I couldn’t get TA-Lib installed and working on the machine I was working on (no clue what was wrong either) so I decided to write my own indicator.  While looking around the web for a good algorithm to use, I ran across a new module that I hadn’t see before called stockstats.

Stockstats is a wrapper for pandas dataframes and provides the ability to calculate many different stock market indicators / statistics.  The fact that it is a simple wrapper around pandas is ideal since I do 99% of my work within pandas.

To use stockstats, you simply to to ‘convert’ a pandas dataframe to a stockstats dataframe. This can be done like so:

Then, to calculate the RSI for this dataframe, all you need to do is pass a command into the stockstats dataframe.

The above calculates the 14-day RSI for the entire dataframe.

Let’s look at a full example using data from yahoo.

First, import the modules we’ll need:

Pull down all the historical data for the S&P 500 ETF (SPY):

Taking a look at the ‘tail’ of the data gives us something like the data in Table 1.

SPY historical data - stock market indicators
Table 1: SPY Historical Data

To calculate RSI, retype the pandas dataframe into a stockstats dataframe and then calculate the 14-day RSI.

With this approach, you end up with some extra columns in your dataframe. These can easily be removed with the ‘del’ command.

With these extra columns removed, you now have the 14-day RSI values a column titled “rsi”.

SPY Historical Data with RSI  - stock market indicators
Table 2: SPY Historical Data with RSI

One caveat on this approach – stockstats seems to take the ‘close’ column. This might or might not be an issue for you if you are wanting to use the Adj Close column provided by yahoo. This is a simple fix (delete the ‘close’ and rename ‘adj close’ to ‘close’).

Stockstats currently has about 26 stats and stock market indicators included. Definitely not as robust as TA-Lib, but it does have the basics. If you are working with stock market data and need some quick indicators / statistics and can’t (or don’t want to) install TA-Lib, check out stockstats.