Spyder Typing Delay

Recently, the Spyder IDE faced serious typing delay or lag. Basically, after entering something using the keyboard, it takes almost half a second for it to appear on the screen. The issue can also be described as a typing lag in Spyder.

This seems to have been triggered by the installation of Big Sur OS.

Spyder slow on Mac

After intensive googling, we stumbled upon this Github page detailing several possible solutions.

The one that worked for us was installing pyqt and pyqtwebengine. Basically, type the following commands in the terminal:

pip install PyQt5
pip install PyQtWebEngine

The above solution should be very safe since it is just installing Python packages.

Spyder lagging

The above solution helped to solve the troubling issue of Spyder lagging. Since Spyder uses Qt for its GUI, it is critical to keep the various Qt related packages updated / at the correct version. This may be the reason why installing PyQt5 and PyQtWebEngine helps to remove the lag in Spyder.

Spyder very slow

There seems to be many reasons, other than the above, that can result in Spyder being very slow. One tip that is useful, is never update to the latest version of Spyder, Mac OS, or Anaconda immediately once it is released, unless it is absolutely necessary. Most of the bugs appear in the newest releases, and can cause multiple problems including making Spyder very slow. By updating at a later date, most of the bugs would have been solved by then and it is a much safer approach.

Previously, updating to the latest Spyder 4.1.5 also caused several problems, including lag, slowness or even Spyder simply just crashing.

Advertisement

Python matplotlib Plot Multiple Figures in Separate Windows

Matplotlib is a popular plotting package used in Python. There are some things to note for plotting multiple figures, in separate windows.

A wrong approach may lead to matplotlib showing a black screen, or plotting two figures superimposed on each other, which may not be the desired outcome.

Sample Matplotlib Code for Plotting Multiple Figures in Separate Windows

import matplotlib.pyplot as plt

plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename1')


plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename2')

plt.show()

One way to do this is to use the plt.figure() command for each figure that you want to plot separately. Optionally, you can use plt.savefig() if you wish to save the figure plotted to the working directory folder.

At the end, use the plt.show() command. The plt.show() command should only be used once per script.

Updating Spyder takes forever

Spyder is a Python IDE that is bundled together with the Anaconda distribution.

There are some problems that are commonly faced when it comes to updating Spyder. One way to update Spyder is to open Anaconda Navigator and click the settings button which has an option to update Spyder. But the problem is that the process can take a very long time. The process shows that it is “loading packages of /User/…/opt/anaconda3”.

Updating Spyder is constricted by …

Another way to update Spyder is to type “conda update spyder” in the terminal. A problem that can crop up is the error message: “updating spyder is constricted by …

Anaconda stuck updating Spyder [Solved]

For my case, it turns out that the version of Anaconda Navigator is outdated. Hence, I first updated Anaconda Navigator to the latest version.

Then, instead of clicking “Update application” which still didn’t quite work, we click on “Install specific version” and choose the latest version of Spyder (Spyder 4.1.5 in this case).

Then, the updating of Spyder in Anaconda Navigator worked perfectly!

How to update Spyder using Anaconda-Navigator: Click “Install specific version” instead of “Update application”.

How to speed up R code in RStudio

I just found out by trial and error that the suppressing of print statements in RStudio greatly speeds up the R code.

In my case, code that was originally estimated to take around 40 hours to run,  just ran in under an hour after I suppressed all the print statements in the for loops.

This is supported by evidence in other forums, for example in StackOverflow: R: Does the use of the print function inside a for loop slow down R?

Basically, if your code prints too much output to the console, it will slow down RStudio and your R code as well. It may be due to all the output clogging up the memory in RStudio. R is known to be “single thread” so it can only use 1 CPU at a time, even if your computer has multiple cores.

Hence, the tips are to:

  • Reduce the number of print statements in the code manually.
  • Set quiet=TRUE in all scan statements. Basically, the default behavior is that scan() will print a line, saying how many items have been read.

This is especially true with for loops, since the amount of printed output can easily number to the millions, and overwhelm RStudio.

How to keep Python / R Running on Mac (without screen lock or sleep)

When the Mac (or MacBook) is running for a long time, it is very liable to do one of the following things:

  • sleep
  • screen saver
  • lock screen

The problem is that your Python program or R program running in the background will most likely stop completely. Sure, it can resume when you activate the Mac again, but that is not what most people want! For one, it may impact the accurate calculation of elapsed time of your Python code.

Changing settings via System Preferences -> Energy Saver is a possible solution, but it is troublesome and problematic:

  • Have to switch it on and off again when not in use (many steps).
  • Preventing sleep may still run into screen saver, screen lock, etc.
  • Vice versa, preventing screen lock may still run into Mac sleeping, etc.

The solution is to install this free App called Amphetamine. Despite its “drug” name, it is a totally legitimate program that has high reviews everywhere. What this app does is to prevent your Mac from stopping, locking or sleeping. Hence, whatever program you are running will not halt till the program is done (or when you switch off Amphetamine).

It is a great program that does its job well! Highly recommended for anyone doing programming, video editing or downloading large files on Mac.

Best way to time algorithms in Python

There are tons of ways to calculate elapsed time (in seconds) for Python code. But which is the best way?

So far, I find that the “timeit” method seems to give good results, and is easy to implement. Source: https://stackoverflow.com/questions/7370801/measure-time-elapsed-in-python

Use timeit.default_timer instead of timeit.timeit. The former provides the best clock available on your platform and version of Python automatically:

from timeit import default_timer as timer

start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

This is the answer by the user “jfs” on Stack Overflow.

Benefits of the above method include:

  • Using timeit will produce far more accurate results since it will automatically account for things like garbage collection and OS differences (comment by user “lkgarrison”)

Please comment below if you know other ways of measuring elapsed time on Python!

Other methods include:

  • time.clock()  (Deprecated as of Python 3.3)
  • time.time() (Is this a good method?)
  • time.perf_counter() for system-wide timing,
  • or time.process_time() for process-wide timing

 

Python (Anaconda) does not work with MacOS Catalina!

This is just to highlight that the Anaconda Python Distribution does not work with the latest MacOS Catalina. I only realized upon trying to open Anaconda Navigator, after installing Catalina.

The only (good) solution seems to be reinstalling Anaconda.

Source: https://www.anaconda.com/how-to-restore-anaconda-after-macos-catalina-update/

MacOS Catalina was released on October 7, 2019, and has been causing quite a stir for Anaconda users.  Apple has decided that Anaconda’s default install location in the root folder is not allowed. It moves that folder into a folder on your desktop called “Relocated Items,” in the Security folder. If you’ve used the .pkg installer for Anaconda, this probably broke your Anaconda installation.  Many users discuss the breakage at https://github.com/ContinuumIO/anaconda-issues/issues/10998.

Python save csv to folder

In Python (pandas), saving a .csv file to a particular folder is not that hard, but then it may be confusing to beginners.

The packages we need to import are:

import pandas as pd
import os.path

Say, your folder name is called “myfolder”, and the dataframe you have is called “df”. To save it insider “myfolder” as “yourfilename.csv”, the following code does the job:

df.to_csv(os.path.join('myfolder','yourfilename.csv'))

The reason this may be difficult for beginners is that beginners may not know of the existence of the os.path.join method, which is the recommended method for joining one or more path components.

Popular packages in R and Python for Data Science

Most of the time, users of R and Python will rely on packages and libraries as far as possible, in order to avoid “reinventing the wheel”. Packages that are established are also often superior and preferred, due to lower chance of errors and bugs.

We list down the most popular and useful packages in R and Python for data science, statistics, and machine learning.

Packages in R

  • arules
  • arulesViz
  • car
  • caret
  • cluster
  • corrplot
  • ggplot2
  • lattice
  • perturb
  • psych
  • readr
  • recommenderlab
  • reshape2
  • ROCR
  • rpart
  • rpart.plot
  • tidyverse

Python Packages

  • factor_analyzer
  • math
  • matplotlib
  • numpy
  • pandas
  • scipy
  • seaborn
  • sklearn
  • statsmodels

pip install keeps installing old/outdated packages

This article is suitable for solving the following few problems:

  1. module ‘sklearn.tree’ has no attribute ‘plot_tree’
  2. pip install (on Spyder, Anaconda Prompt, etc.) does not install the latest package.

The leading reason for “module ‘sklearn.tree’ has no attribute ‘plot_tree” is because the sklearn package is outdated.

Sometimes “pip install scikit-learn” simply does not update the sklearn package to the latest version. Type “print(sklearn.__version__)” to get the version of sklearn on your machine, it should be at least 0.21.

The solution is to force pip to install the latest package:

pip install --no-cache-dir --upgrade <package>

In this case, we would replace <package>  by “scikit-learn”.


Sometimes, pip install does not work in the Spyder IPython console, it displays an error to the effect that you should install “outside the IPython console”. This is not normal (i.e. it should not happen), but as a quick fix you can try “pip install” in Anaconda Prompt instead. It is likely that something wrong went on during the installation of Anaconda, Python, and the long-term solution is to reinstall Anaconda.

caret package in R: known issue when converting factor variables

In the R language, often you have to convert variables to “factor” or “categorical”. There is a known issue in the ‘caret’ library that may cause errors when you do that in a certain way.

The correct way to convert variables to ‘factor’ is:

trainset$Churn = as.factor(trainset$Churn)

In particular, “the train() function in caret does not handle factor variables well” when you convert to factors using other methods.
(See https://rpubs.com/SulmanKhan/444033)

Basically, if you use other ways to convert to ‘factor’, the code may still run, but there may be some ‘weird’ issues that leads to inaccurate predictions (for instance if you are doing logistic regression, decision trees, etc.)

How to save sklearn tree plot as file (Vector Graphics)

The Scikit-Learn (sklearn) Python package has a nice function sklearn.tree.plot_tree to plot (decision) trees. The documentation is found here.

However, the default plot just by using the command

tree.plot_tree(clf)

could be low resolution if you try to save it from a IDE like Spyder.

The solution is to first import matplotlib.pyplot:

import matplotlib.pyplot as plt

Then, the following code will allow you to save the sklearn tree as .eps (or you could change the format accordingly):

plt.figure()
tree.plot_tree(clf,filled=True)  
plt.savefig('tree.eps',format='eps',bbox_inches = "tight")

To elaborate, clf is your Decision Tree classifier (to be defined before plotting the tree):

# Example from https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
clf = tree.DecisionTreeClassifier(random_state=0)
clf = clf.fit(iris.data, iris.target)

The outcome is a Vector Graphics format (.eps) tree that will retain its full resolution when zoomed in. The bbox_inches=”tight” command prevents truncating of the image. Without that command, sometimes the sklearn tree will just be cropped off and be incomplete.

Calculate Cronbach Alpha using Python

R has the package “psych” which allows one to calculate the Cronbach’s alpha very easily just by one line:

psych::alpha(your_data, column_list)

For Python, the situation is more tricky since there does not seem to exist any package for calculating Cronbach’s alpha. Fortunately, the formula is not very complicated and it can be calculated in a few lines.

An existing code can be found on StackOverflow, but it has some small “bugs”. The corrected version is:

def CronbachAlpha(itemscores):
    itemscores = np.asarray(itemscores)
    itemvars = itemscores.var(axis=0, ddof=1)
    tscores = itemscores.sum(axis=1)
    nitems = itemscores.shape[1]

    return (nitems / (nitems-1)) * (1 - (itemvars.sum() / tscores.var(ddof=1)))

The input “itemscores” can be your Pandas DataFrame or any numpy array. (Note that this method requires you to “import numpy as np”).

Python code for PCA Rotation “varimax” matrix

The R programming language has an excellent package “psych” that Python has no real equivalent of.

For example, R can do the following code using the principal() function:

principal(r=dat, nfactors=num_pcs, rotate="varimax")

to return the “rotation matrix” in principal component analysis based on the data “dat” and the number of principal components “num_pcs”, using the “varimax” method.

The closest equivalent in Python is to first use the factor_analyzer package:

from factor_analyzer import FactorAnalyzer

Then, we use the following code to get the “rotation matrix”:

fa = FactorAnalyzer(n_factors=3, method='principal', rotation="varimax")
fa.fit(dat)
print(fa.loadings_.round(2))

How to write Bash file to run multiple Python scripts simultaneously

Step 1 is to create a Bash file (using any editor, even Notepad). Sample code:

#!/usr/bin/env bash
python testing.py &
python testingb.py &

The above code will run two Python files “testing.py” and “testingb.py” simultaneously. Add more python scripts if needed. The first line is called the “shebang” and signifies the computer to run bash (there are various versions but according to StackOverflow the above one is the best).

The above bash file can be saved to any name and any extension, say “bashfile.txt”.

Step 2 is to login to Terminal (Mac) or Putty (Windows).

Type:

chmod +x bashfile.txt

This will make the “bashfile.txt” executable.

Follow up by typing:

nohup ./bashfile.txt

This will run the “bashfile.txt” and its contents. The output will be put into a file called “nohup.out”. The “nohup” option is preferred for very long scripts since it will keep running even if the Terminal closes (due to broken connection or computer problems).

Python Online Courses for Teenagers/Adults

If your child is interested in a Computer Science/Data Science career in the future, do consider learning Python beforehand. Computer Science is getting very popular in Singapore again. To see how popular it is, just check out the latest cut-off point for NUS computer science, it is close to perfect score (AAA/B) for A-levels.

According to many sources, the Singapore job market (including government sector) is very interested in skills like Machine Learning/ Deep Learning/Data Science. It seems that Machine Learning can be used to do almost anything and everything, from playing chess to data analytics. Majors such as accountancy and even law are in danger of being replaced by Machine Learning. Python is the key language for such applications.

I just completed a short course on Python: Python A-Z™: Python For Data Science With Real Exercises! The course fee is payable via Skillsfuture for Singaporeans, i.e. you don’t have to pay a single cent. (You have to purchase it first, then get a reimbursement from Skillsfuture.) At the end, you will get a Udemy certificate which you can put in your LinkedIn profile.

The course includes many things from the basic syntax to advanced visualization of data. It teaches at quite a basic level, I am sure most JC students (or even talented secondary students) with some very basic programming background can understand it.

The best programming language for data science is currently Python. Try not to learn “old” languages like C++ as it can become obsolete soon. Anyway the focus is on the programming structure, it is more or less universal across different languages.icon

Udemy URL: Python A-Z™: Python For Data Science With Real Exercises!

Related posts on Python:

import matplotlib on PyCharm: Quick 3 min solution

Students trying to import the package “matplotlib” on PyCharm will soon face the cryptic error message: “Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework.”

It is extremely puzzling what to do. I have researched the steps that you can follow to solve it in 3 minutes:

First you need to install the “matplotlib” on PyCharm following the instructions here: https://stackoverflow.com/questions/21883768/pycharm-and-external-libraries

Then, to import matplotlib, you need the following lines:

import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt

Done! You are ready to use matplotlib on PyCharm interpreter, by using:

plt.plot(...)
plt.show()

Python Math Programming

Recently, I am thinking of learning the Python language for Math programming.

An advantage for using Python for Math Programming (e.g. testing out some hypothesis about numbers), is that the Python programming language theoretically has no largest integer value that it can handle. It can handle integers as large as your computer memory can handle. (Read more at: http://userpages.umbc.edu/~rcampbel/Computers/Python/numbthy.html)

Other programming languages, for example Java, may have a maximum integer value beyond which the program starts to fail. Java integers can only have a maximum value of 2^{31}-1 \approx 2.15 \times 10^9, which is pretty limited if you are doing programming with large numbers (for example over a trillion). For instance, the seventh Fermat number is already 18446744073709551617. I was using Java personally until recently I needed to program larger integers to test out some hypothesis.

How to install Python (free):

Hope this is a good introduction for anyone interested in programming!


Featured book:

Learning Python, 5th Edition

Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on author Mark Lutz’s popular training course, this updated fifth edition will help you quickly write efficient, high-quality code with Python. It’s an ideal way to begin, whether you’re new to programming or a professional developer versed in other languages.

Complete with quizzes, exercises, and helpful illustrations, this easy-to-follow, self-paced tutorial gets you started with both Python 2.7 and 3.3— the latest releases in the 3.X and 2.X lines—plus all other releases in common use today. You’ll also learn some advanced language features that recently have become more common in Python code.

  • Explore Python’s major built-in object types such as numbers, lists, and dictionaries
  • Create and process objects with Python statements, and learn Python’s general syntax model
  • Use functions to avoid code redundancy and package code for reuse
  • Organize statements, functions, and other tools into larger components with modules
  • Dive into classes: Python’s object-oriented programming tool for structuring code
  • Write large programs with Python’s exception-handling model and development tools
  • Learn advanced Python tools, including decorators, descriptors, metaclasses, and Unicode processing