Collecting / Storing Tweets with Python and MySQL

A few days ago, I published Collecting / Storing Tweets with Python and MongoDB. In that post, I describe the steps needed to collect and store tweets gathered via the Twitter Streaming API.

I received a comment on that post asking how to store data into MySQL instead of MongoDB. Here’s what you’d need to do to make that change.

Collecting / Storing Tweets with Python and MySQL

In the previous post, we store the twitter data into MongoDB with one line of code:

Unlike MongoDB where we can insert the entire json object, if we want to use MySQL instead of MongoDB, we need to do some additional work with the ‘datajson’ object before storing it.

Let’s assume that we are interested in just capturing the username, date, Tweet and Tweet ID from twitter.  This is most likely the bare minimum information you’d want to capture from the API…there are many (many) more fields available but for now, we’ll go with these four.

Note: I’ll hold off on the MySQL specific changes for now and touch on them shortly.

Once you capture the tweet (line 38 in my original script) and have it stored in your datajson object, create a few variables to store the date, username, Tweet and ID.

Note in the above that we are using the parser.parse() command from the dateutil  module to parse the created_at date for storage into Mysql.

Now that we have our variables ready to go, we’ll need to store those variables into MySQL.  Before we can do that, we need to set up a MySQL connection. I use the python-mysql connector but you are free to use what you need to.  You’ll need to do a import MySQLdb  to get the connector imported into your script (and assuming you installed the connector with pip install mysql-python.

You’ll need to create a table to store this data. You can use the sql statement below to do so if you need assistance / guidance.

Now, let’s set up our MySQL connection, query and execute/commit for the script. I’ll use a function for this to be able to re-use it for each tweet captured.

That’s it.  You are now collecting tweets and storing those tweets into a MySQL database.

Full Script

 


MySQL for PythonWant more information on working with MySQL and Python? Check out the book titled MySQL for Python by Albert Lukaszewski.

 

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com

24 thoughts on “Collecting / Storing Tweets with Python and MySQL

  1. File “tweet.py”, line 60
    except Exception as e:
    ^
    IndentationError: unindent does not match any outer indentation level

    Just add a space in line 60, other than that it works like magic.
    Thanks

    1. Also, can you tell me how to handle the geo tag? I want to store coordinates whenever available.

      I added a geo tag, but I got:

      (1064, ‘You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \’: “\’Point\'”, u\’coordinates\’: \'(28.5383355,-81.3792365)\’})\’ at line 1′)

      Thanks

        1. I see the function mysql_escape_string(), but there is no explanation or examples for it. : ( sorry don’t mean to bother you

  2. so where/when does this script actually stop running? I imagine it isnt filling up your database with a tonne of live tweets until your database can no longer run. Is there a certain amount of tweets it stores?

    thanks

  3. expected an intended block

    db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset=”utf8″)

    1. I’m not quite sure what you are asking.

      Did you mean “indendted” block? If so, you always need to make sure your code has the proper spacing (indentations) for python. When copy/pasting code, this can be an issue.

  4. Hi Eric,

    Helpful blog, thanks.

    I keep getting this error – tweet collected at 2017-02-20 19:16:36+00:00
    (1292, “Incorrect datetime value: ‘2017-02-20 19:16:36+00:00’ for column ‘created_at’ at row 1”)

    Would you be able to help me?

      1. Hi Eric,

        Thanks for your reply. MySQL DDL:

        CREATE TABLE twitter (
        id int(11) NOT NULL AUTO_INCREMENT,
        tweet_id varchar(250) DEFAULT NULL,
        screen_name varchar(128) DEFAULT NULL,
        created_at timestamp NULL DEFAULT NULL,
        text text,
        PRIMARY KEY (id)
        ) ENGINE=InnoDB AUTO_INCREMENT=73 DEFAULT CHARSET=utf8

        I experienced with changing the ‘timestamp’ to ‘varchar(250)’. Then the script works. I think the problem is that script gives me back the following answer: tweet collected at 2017-03-08 10:11:11+00:00. The ‘+00:00’ which indicates the timezone is probably the problem for the SQL database? I can’t find a way to adjust this string.

  5. What datatype would I use for the MySql create query if I wanted to use the coordinates, user, retweet, and retweet_count parameters? Would everything essentially be returned as a string that gets processed into JSON? Thanks in advance!

    1. I would store coordinates, user and retweet as strings and retweet_count as an int. You could store everything as a string if you wanted and then handle the content in your script when you read it out if you wanted to.

  6. Traceback (most recent call last):
    I have a problem, i can’t insatll the module MySQLdb. I try to install it with the following comment : “pip install mysql-python”.
    File “tweet_sql.py”, line 7, in
    import MySQLdb
    ImportError: No module named MySQLdb

    1. ImportError: No module named MySQLdb

      I resolved this on Centos6 using: yum install MySQL-python
      Might be different install method depending on OS

  7. Hi Eric,

    I have used this as a base to for a twitter scraper, I am collecting alot more information from twitter than the four columns mentioned in this post.

    For some reason when i run the scrip it just keeps printing ‘verified’ instead of printing the time the tweet was collected. Also none of the data is being inserted into the MySQL database i created to house the information. Any insight you have would be invaluable, thanks for this post regardless.

Leave a Reply

Your email address will not be published. Required fields are marked *