In this week’s Python Data Weekly Roundup:
This is a very good ‘how to’ for beginners to learn to tweak the Matplotlib visualization library. This article explains how to tweak matplotlib charts including changing the size, removing borders, changing colors and widths of chart lines. Each tweak includes python code to make the tweaks.
Did you know there are different types of averages (aka means)? After reading this article, you’ll have an understanding of what the difference is between the arithmetic, geometric and harmonic means are, why you should use one over the other and how to calculate them using python code.
Should you be paid for all the personal data that you’ve made available online? If so, what is that data worth? In this fantastic article, Ruoxi Jia describes how to value personal data and describes how to apply the Shapley Value in data valuation and in general machine learning usages (e.g., interpreting black-box model predictions). An example of the Shapley Value is below. The below graphic shows two images from the article:
(a) The Shapley value produced by our proposed exact approach and the baseline Monte-Carlo approximation algorithm for the KNN classifier constructed with 1000 randomly selected training points from MNIST. (b) Runtime comparison of the two approaches as the training size increases.
In this article, Microsoft Senior Research Xu Tan describes a new text-to speech model called FastSpeech. This new model is claimed to be fast, robust, controllable and high quality (which are all valuable and necessary features). A deep dive of this model can be found here.
Another great article from Jason Brownlee describing how to combine multiple models into an ensemble model for use in predictive modeling. Jason provides python code that you can use to build your own Super Learner with scikit-learn. Additionally – and more importantly – Jason does a fantastic job of highlight the theory behind Super Learners with many links to articles and journals on the topic.
An overview of the DeepMind scholarship program as well as a description of why it makes sense to help others move into the field of AI.
Natural Language Processing is well known as a way to analyze text. I’ve written a bit about using NLP here on the site (see here and here). In this article, Julien Heiduk describes how he was able to use the GPT-2 model to generate text with python. In fact, the article is almost completely generated text via the GPT-2 model..and it does a good job of creating readable and understandable content.
Is there a ‘best’ degree for data science? Personally, I don’t think there is….but I can see there being better degrees for people that are just starting out. For example, all things being equal on the personal front, a degree in statistics is going to be much better for you than a degree in horticulture…but…that’s not to say the statistics degree makes you a better data scientist…it just gives you the tools to get into the field quicker than someone with the horticulture degree. That said, I do like what Stephanie Glen says in this article when she writes: “getting a degree should be looked at as a stepping block, not a train ride to a destination. No single degree is likely to get you in the door.”
That’s it for this week’s Python Data Weekly Roundup. Subscribe to our newsletter to receive this weekly roundup in your email.