I’m currently working with stock market trade data that is output from a backtesting engine (I’m working with backtrader currently) in a pandas dataframe. The format of the ‘transcations’ data that is provided out of the backtesting engine is shown below.
The data provided gives four crucial pieces of information:
- Date – The date of a transaction.
- Amount – the number of shares purchased (positive number) or sold (negative number) during the transaction.
- Price – the price received or paid at the time of the sale.
- Value – the cash value of the transaction.
Backtrader’s transactions dataframe is comprised of two rows make for one one transaction (the first is the ‘buy’ the second is the ‘sell). For example, in the data above, the first two rows (Jan 7 2016 and Sept 7th 2016) are the ‘buy’ data and ‘sell’ data for one transaction. What I need to do with this data is transform it (using that term loosely) into one row of data for each transaction to store into database for use in another analysis.
I could leave it in its current form, but I prefer to store transactions in one row when dealing with market backtests.
There are a few ways to attack this particular problem. You could iterate over the dataframe and manually pick each row. That would be pretty straightforward, but not necessarily the best way.
While looking around the web for some pointers, I stumbled across this answer that does exactly what I need to do. I added the following code to my script and — voila — I have my transactions transformed from two rows per transaction to one row.
trade_output= from itertools import tee, izip def next_row(iterable): a, b = tee(iterable) next(b, None) return izip(a, b) for (buy_index, buy_data), (sell_index, sell_data) in next_row(transactions.iterrows()): if buy_data['amount']>0: trade_output.append((buy_index.date(), buy_data["amount"], buy_data['price'],\ sell_index.date(), sell_data["amount"], sell_data['price']))
Note: In the above, I only want to build rows that start with a positive amount in the ‘amount’ column because amount in the first row of a transactions is always positive. I then append each transformed transaction into an array to be used for more analysis down at a later point in the script.