Tag: data

data roundup

Python Data Weekly Roundup – Jan 10 2020

In this week’s Python Data Weekly Roundup:

A Comprehensive Learning Path to Understand and Master NLP in 2020

If you’re looking to learn more about Natural Language Processing (NLP) in 2020, this is a very good article describing a good learning path to take including links to articles, courses, videos and more to get you started down the road of becoming proficient with the tools and methods of NLP.

The Best of Both Worlds: Forecasting US Equity Market Returns using a Hybrid Machine Learning – Time Series Approach


Predicting long-term equity market returns is of great importance for investors to strategically allocate their assets. We apply machine learning methods to forecast 10-year-ahead U.S. stock returns and compare the results to traditional Shiller regression-based forecasts more commonly used in the asset-management industry. Machine-learning forecasts have similar forecast errors to a traditional return forecast model based on lagged CAPE ratios. However, machine-learning forecasts have higher forecast errors than the regression-based, two-step approach of Davis et al [2018] that forecasts the CAPE ratio based on macroeconomic variables and then imputes stock returns. When we combine our two-step approach with machine learning to forecast CAPE ratios (a hybrid ML-VAR approach), U.S. stock return forecasts are statistically and economically more accurate than all other approaches. We discuss why and conclude with some best practices for both data scientists and economists in making real-world investment return forecasts.

 Improving U.S. stock return forecasts: A “fair-value” CAPE approach
Source: Improving U.S. stock return forecasts: A “fair-value” CAPE approach

Building machine learning workflows with AWS Data Exchange and Amazon SageMaker

This article describes how to use AWS’ Sagemaker and Data Exchagne to build a machine learning model and machine learning workflows.   What I found interesting is the ability to use AWS Data Exchange to find a large number of different types of data.

Tutorial: Python Regex (Regular Expressions) for Data Scientists

I hate regex. Of course I love the functionality and capabilities of using regex, but I loathe my inability to come up with my own regex ‘formulas’. I *always* have to go out on the web to search for how to do what I’m trying to do.  This article doesn’t solve that problem for me, but it does provide a refresher in regex patterns and a reminder why regex is important.

That’s it for this week’s Python Data Weekly Roundup. Subscribe to our newsletter to receive this weekly roundup in your email.


data roundup

Python Data Weekly Roundup – Dec 18 2019

In this week’s Python Data Weekly Roundup:

The Last Matplotlib Tweaking Guide You’ll Ever Need

This is a very good  ‘how to’ for beginners to learn to tweak the Matplotlib visualization library.  This article explains how to tweak matplotlib charts including changing the size, removing borders, changing colors and widths of chart lines.  Each tweak includes python code to make the tweaks.

Arithmetic, Geometric, and Harmonic Means for Machine Learning

Did you know there are different types of averages (aka means)?  After reading this article, you’ll have an understanding of what the difference is between the arithmetic, geometric and harmonic means are, why you should use one over the other and how to calculate them using python code.

What is My Data Worth?

Should you be paid for all the personal data that you’ve made available online? If so, what is that data worth?  In this fantastic article, Ruoxi Jia describes how to value personal data and describes how to apply the Shapley Value in data valuation and in general machine learning usages (e.g., interpreting black-box model predictions). An example of the Shapley Value is below. The below graphic shows two images from the article:

(a) The Shapley value produced by our proposed exact approach and the baseline Monte-Carlo approximation algorithm for the KNN classifier constructed with 1000 randomly selected training points from MNIST. (b) Runtime comparison of the two approaches as the training size increases.

Shapely Value Graph

FastSpeech: New text-to-speech model improves on speed, accuracy, and controllability

In this article, Microsoft Senior Research Xu Tan describes a new text-to speech model called FastSpeech. This new model is claimed to be fast, robust, controllable and high quality (which are all valuable and necessary features).   A deep dive of this model can be found here.

How to Develop Super Learner Ensembles in Python

Another great article from Jason Brownlee describing how to combine multiple models into an ensemble model for use in predictive modeling.  Jason provides python code that you can use to build your own Super Learner with scikit-learn. Additionally – and more importantly – Jason does a fantastic job of highlight the theory behind Super Learners with many links to articles and journals on the topic.

Strengthening the AI community

An overview of the DeepMind scholarship program as well as a description of why it makes sense to help others move into the field of AI.

Text Generation with Python

Natural Language Processing is well known as a way to analyze text.  I’ve written a bit about using NLP here on the site (see here and here). In this article, Julien Heiduk describes how he was able to use the GPT-2 model to generate text with python. In fact, the article is almost completely generated text via the GPT-2 model..and it does a good job of creating readable and understandable content.

Best Degree for Data Science (in One Picture)

Is there a ‘best’ degree for data science? Personally, I don’t think there is….but I can see there being better degrees for people that are just starting out.   For example, all things being equal on the personal front, a degree in statistics is going to be much better for you than a degree in horticulture…but…that’s not to say the statistics degree makes you a better data scientist…it just gives you the tools to get into the field quicker than someone with the horticulture degree. That said, I do like what Stephanie Glen says in this article when she writes: “getting a degree should be looked at as a stepping block, not a train ride to a destination. No single degree is likely to get you in the door.”

That’s it for this week’s Python Data Weekly Roundup. Subscribe to our newsletter to receive this weekly roundup in your email.