Tag Archives: data science

Python (Anaconda) does not work with MacOS Catalina!

This is just to highlight that the Anaconda Python Distribution does not work with the latest MacOS Catalina. I only realized upon trying to open Anaconda Navigator, after installing Catalina. The only (good) solution seems to be reinstalling Anaconda. Source: … Continue reading

Posted in Uncategorized | Tagged , , , , | Leave a comment

Best Pattern Recognition and Machine Learning Book (Bishop)

Pattern Recognition and Machine Learning (Information Science and Statistics) The above book by Christopher M. Bishop is widely regarded as one of the most comprehensive books on Machine Learning. At over 700 pages, it has coverage of most machine learning … Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

2 types of chi-squared test

Most people have heard of chi-squared test, but not many know that there are (at least) two types of chi-squared tests. The two most common chi-squared tests are: 1-way classification: Goodness-of-fit test 2-way classification: Contingency test The goodness-of-fit chi-squared test … Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

Popular packages in R and Python for Data Science

Most of the time, users of R and Python will rely on packages and libraries as far as possible, in order to avoid “reinventing the wheel”. Packages that are established are also often superior and preferred, due to lower chance … Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

pip install keeps installing old/outdated packages

This article is suitable for solving the following few problems: module ‘sklearn.tree’ has no attribute ‘plot_tree’ pip install (on Spyder, Anaconda Prompt, etc.) does not install the latest package. The leading reason for “module ‘sklearn.tree’ has no attribute ‘plot_tree” is … Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

How to save sklearn tree plot as file (Vector Graphics)

The Scikit-Learn (sklearn) Python package has a nice function sklearn.tree.plot_tree to plot (decision) trees. The documentation is found here. However, the default plot just by using the command tree.plot_tree(clf) could be low resolution if you try to save it from a … Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

Making big data a little smaller

While this result is nice, it also seems to mean that theoretically, we have already reached the limit in dimensional reduction for data compression. Source: Science Daily Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed … Continue reading

Posted in math | Tagged , | 1 Comment