Category: pandas

Stockstats – Python module for various stock market indicators

I’m always working with stock market data and stock market indicators. During this work, there’s times that I need to calculate things like Relative Strength Index (RSI), Average True Range (ATR), Commodity Channel Index (CCI) and other various indicators and stats.

My go-to for this type of work is TA-Lib and the python wrapper for TA-Lib but there’s times when I can’t install and configure TA-Lib on a computer. When this occurs, I then have to go find the various algorithms to calculate the various indicators / stats that I need.  When this happens, I usually end up making mistakes and/or taking longer than I really should to get these algo’s built to use in a project.  Of course I re-use what I can when I can but many times I’ve forgotten that I built an RSI function in the past and recreate.

I found myself in this situation today. I need an RSI calculation for some work I’m doing.  I couldn’t get TA-Lib installed and working on the machine I was working on (no clue what was wrong either) so I decided to write my own indicator.  While looking around the web for a good algorithm to use, I ran across a new module that I hadn’t see before called stockstats.

Stockstats is a wrapper for pandas dataframes and provides the ability to calculate many different stock market indicators / statistics.  The fact that it is a simple wrapper around pandas is ideal since I do 99% of my work within pandas.

To use stockstats, you simply to to ‘convert’ a pandas dataframe to a stockstats dataframe. This can be done like so:

stockstats_df = StockDataFrame.retype(df)

Then, to calculate the RSI for this dataframe, all you need to do is pass a command into the stockstats dataframe.

stock['rsi_14']

The above calculates the 14-day RSI for the entire dataframe.

Let’s look at a full example using data from yahoo.

First, import the modules we’ll need:

import pandas as pd
import pandas.io.data as web
from stockstats import StockDataFrame as Sdf

Pull down all the historical data for the S&P 500 ETF (SPY):

data = web.get_data_yahoo('SPY')

Taking a look at the ‘tail’ of the data gives us something like the data in Table 1.

SPY historical data - stock market indicators
Table 1: SPY Historical Data

To calculate RSI, retype the pandas dataframe into a stockstats dataframe and then calculate the 14-day RSI.

stock_df = Sdf.retype(data)
data['rsi']=stock_df['rsi_14']

With this approach, you end up with some extra columns in your dataframe. These can easily be removed with the ‘del’ command.

del data['close_-1_s']
del data['close_-1_d']
del data['rs_14']
del data['rsi_14']

With these extra columns removed, you now have the 14-day RSI values a column titled “rsi”.

SPY Historical Data with RSI  - stock market indicators
Table 2: SPY Historical Data with RSI

One caveat on this approach – stockstats seems to take the ‘close’ column. This might or might not be an issue for you if you are wanting to use the Adj Close column provided by yahoo. This is a simple fix (delete the ‘close’ and rename ‘adj close’ to ‘close’).

Stockstats currently has about 26 stats and stock market indicators included. Definitely not as robust as TA-Lib, but it does have the basics. If you are working with stock market data and need some quick indicators / statistics and can’t (or don’t want to) install TA-Lib, check out stockstats.

pandas Cheat Sheet (via yhat)

pandas cheat sheetThe folks over at yhat just released a cheat sheet for pandas.  You can download the cheat sheet in PDF for here.

There’s a couple important functions that I use all the time missing from their cheat sheet (actually….there are a lot of things missing, but its a great starter cheat sheet).

A few things that I use all the time with pandas dataframes that are worth collecting in one place are provided below.

Renaming columns in a pandas dataframe:

df.rename(columns={'col1': 'Column_1', 'col2': 'Column_2'}, inplace=True)

Iterating over a pandas dataframe:

for index, row in df.iterrows():
 * DO STUFF

Splitting pandas dataframe into chunks:

The function plus the function call will split a pandas dataframe (or list for that matter) into NUM_CHUNKS chunks. I use this often when working with the multiprocessing libary.

# This function creates chunks and returns them
def chunkify(lst,n):
    return [ lst[i::n] for i in xrange(n) ]
chunks = chunkify(df, NUMCHUNKS)

Accessing the value of a specific cell:

This will give you the value of the last row’s “COLUMN” cell.  This may not be the ‘best’ way to do it, but it gets the value

df.COLUMN.tail(1).iloc[0]

Getting rows matching a condition:

The below will get all rows in a pandas dataframe that match the criteria.  In addition to finding equality, you can do all the logical operators.

df[df.COLUMN == Criteria]

Getting rows matching multiple conditions:

This gets rows that match a criteria in COLUMN1 and those that match another criteria in COLUMN2

 df[(df.COLUMN1 == Criteria) & (df.COLUMN2 == Criteria_2) ]

Getting the ‘next’ row of data in a pandas dataframe

I’m currently working with stock market trade data that is output from a backtesting engine (I’m working with backtrader currently) in a pandas dataframe.  The format of the ‘transcations’ data that is provided out of the backtesting engine is shown below.

amountpricevalue
date
2016-01-07 00:00:00+00:0079.017119195.33-15434.413883
2016-09-07 00:00:00+00:00-79.017119218.8417292.106354
2016-09-20 00:00:00+00:0082.217609214.41-17628.277649
2016-11-16 00:00:00+00:00-82.217609217.5617887.263119

The data provided gives four crucial pieces of information:

  • Date – The date of a transaction.
  • Amount – the number of shares purchased (positive number) or sold (negative number)  during the transaction.
  • Price – the price received or paid at the time of the sale.
  • Value – the cash value of the transaction.

Backtrader’s transactions dataframe is comprised of two rows make for one one transaction (the first is the ‘buy’ the second is the ‘sell).  For example, in the data above, the first two rows (Jan 7 2016 and Sept 7th 2016) are the ‘buy’ data and ‘sell’ data for one transaction. What I need to do with this data is transform it (using that term loosely) into one row of data for each transaction to store into  database for use in another analysis.

I could leave it in its current form, but I prefer to store transactions in one row when dealing with market backtests.

There are a few ways to attack this particular problem.  You could iterate over the dataframe and manually pick each row. That would be pretty straightforward, but not necessarily the best way.

While looking around the web for some pointers, I stumbled across this answer that does exactly what I need to do.   I added the following code to my script and — voila — I have my transactions transformed from two rows per transaction to one row.

trade_output=[]
from itertools import tee, izip
def next_row(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)
for (buy_index, buy_data), (sell_index, sell_data) in next_row(transactions.iterrows()):
    if buy_data['amount']>0:
        trade_output.append((buy_index.date(), buy_data["amount"], buy_data['price'],\
                             sell_index.date(), sell_data["amount"], sell_data['price']))

Note: In the above, I only want to build rows that start with a positive amount in the ‘amount’ column because amount in the first row of a transactions is always positive. I then append each transformed transaction into an array to be used for more analysis down at a later point in the script.