Category: book reveiw

Text Analytics with Python – A book review

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your DataThis is a book review of Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar

One of my go-to books for natural language processing with Python has been Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.  This has been the book for me and was one of my dissertation references.  I used this book so much, that I I had to buy a second copy of this book because I wore the first one out.  I’ve read many other NLP books but haven’t found any that could match this book – till now.

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar is a fantastic book and has now taken a permanent place on my bookshelf.

Unlike many books that I run across, this book spends plenty of time talking about the theory behind things rather than just doing some hand-waving and then showing some code. In fact, there isn’t any code (that I saw) until page 41. That’s impressive these days.   Here’s a quick overview of the book’s layout:

  • Chapter 1 provides the baseline for Natural Language. This is a very good overview for anyone that’s never worked much with NLP.
  • Chapter 2 is a python ‘refresher’. If you don’t know python at all but know some other language, this should get you started enough to use the rest of the book.
  • Chapter’s 3 – 7 is there the real fun begins. These chapters cover Text Classification, Summarization Similarity / Clustering and Semantic / Sentiment Analysis.

If you have some familiarity with python and NLP, you can jump to Chapter 3 and dive into the details.

What I really like about this book is that it places theory first.  I’m a big fan of ‘learning by doing’ but I think before you can ‘do’ you need to know ‘why’ you are doing what you are doing.  The code in the book is really well done as well and uses the NLTK,  Sklearn and gensim libraries for most of the work. Additionally, there are multiple ‘build your own’ sections where the author provides a very good overview (and walk-through) of what it takes to build your own functionality for your own NLP work.

This book is highly recommended.


Links in this post:

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data by Dipanjan Sarkar

 

Book Review – Machine Learning With Random Forests And Decision Trees by Scott Hartshorn

Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some PythonI just finished reading Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python (amazon affiliate link).

The short review

This is a great introductory book for anyone looking to learn more about Random Forests and Decision Trees. You won’t be an expert after reading this book, but you’ll understand the basic theory and and how to implement random forests in python.

The long(ish) review

This is a short book – only 76 pages. But…those 76 pages are full of good, introductory information on Random Forests and Decision Trees.  Even though I’ve been using random forests and other machine learning approaches in python for years, I can easily see value for people that are just starting out with machine learning and/or random forests. That said, there were a few things in the book that I had either forgotten or didn’t know (Entropy Criteria for example).

While the entire book is excellent, the section on Feature Importance is the best in the book.  This section provides a very good description of the ‘why’ and the ‘how’ of feature importance (and therefore, feature selection) for use in random forests and decision trees.  There are some very good points made in this section regarding how to get started with feature selection and cross validation.

Additionally, the book provides a decent overview of the idea of ‘out-of-sample’ (or ‘Out-of-bag’) data.  I’m a huge believer in keeping some data out of your initial training data set to use for validation after you’ve built your models.

If you’re looking for a good introductory book on random forests and decision trees, pick this one up ( (amazon affiliate link)) …its only $2.99 for the kindle version.  Like I mentioned earlier, this book won’t make you an expert but it will provide a solid grounding to get started on the topic of random forests, decision trees and machine learning.

One negative comment I have on this book is that there is very little python in the book. The book isn’t marketed as strictly a python book, but I would have expected a bit more python in the book to help drive home some of the theory with runnable code. That said, this is a very small negative to the book overall.