How to speed up R code in RStudio

I just found out by trial and error that the suppressing of print statements in RStudio greatly speeds up the R code.

In my case, code that was originally estimated to take around 40 hours to run,  just ran in under an hour after I suppressed all the print statements in the for loops.

This is supported by evidence in other forums, for example in StackOverflow: R: Does the use of the print function inside a for loop slow down R?

Basically, if your code prints too much output to the console, it will slow down RStudio and your R code as well. It may be due to all the output clogging up the memory in RStudio. R is known to be “single thread” so it can only use 1 CPU at a time, even if your computer has multiple cores.

Hence, the tips are to:

  • Reduce the number of print statements in the code manually.
  • Set quiet=TRUE in all scan statements. Basically, the default behavior is that scan() will print a line, saying how many items have been read.

This is especially true with for loops, since the amount of printed output can easily number to the millions, and overwhelm RStudio.

Advertisement

How to keep Python / R Running on Mac (without screen lock or sleep)

When the Mac (or MacBook) is running for a long time, it is very liable to do one of the following things:

  • sleep
  • screen saver
  • lock screen

The problem is that your Python program or R program running in the background will most likely stop completely. Sure, it can resume when you activate the Mac again, but that is not what most people want! For one, it may impact the accurate calculation of elapsed time of your Python code.

Changing settings via System Preferences -> Energy Saver is a possible solution, but it is troublesome and problematic:

  • Have to switch it on and off again when not in use (many steps).
  • Preventing sleep may still run into screen saver, screen lock, etc.
  • Vice versa, preventing screen lock may still run into Mac sleeping, etc.

The solution is to install this free App called Amphetamine. Despite its “drug” name, it is a totally legitimate program that has high reviews everywhere. What this app does is to prevent your Mac from stopping, locking or sleeping. Hence, whatever program you are running will not halt till the program is done (or when you switch off Amphetamine).

It is a great program that does its job well! Highly recommended for anyone doing programming, video editing or downloading large files on Mac.

Best way to time algorithms in Python

There are tons of ways to calculate elapsed time (in seconds) for Python code. But which is the best way?

So far, I find that the “timeit” method seems to give good results, and is easy to implement. Source: https://stackoverflow.com/questions/7370801/measure-time-elapsed-in-python

Use timeit.default_timer instead of timeit.timeit. The former provides the best clock available on your platform and version of Python automatically:

from timeit import default_timer as timer

start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

This is the answer by the user “jfs” on Stack Overflow.

Benefits of the above method include:

  • Using timeit will produce far more accurate results since it will automatically account for things like garbage collection and OS differences (comment by user “lkgarrison”)

Please comment below if you know other ways of measuring elapsed time on Python!

Other methods include:

  • time.clock()  (Deprecated as of Python 3.3)
  • time.time() (Is this a good method?)
  • time.perf_counter() for system-wide timing,
  • or time.process_time() for process-wide timing

 

How to write Bash file to run multiple Python scripts simultaneously

Step 1 is to create a Bash file (using any editor, even Notepad). Sample code:

#!/usr/bin/env bash
python testing.py &
python testingb.py &

The above code will run two Python files “testing.py” and “testingb.py” simultaneously. Add more python scripts if needed. The first line is called the “shebang” and signifies the computer to run bash (there are various versions but according to StackOverflow the above one is the best).

The above bash file can be saved to any name and any extension, say “bashfile.txt”.

Step 2 is to login to Terminal (Mac) or Putty (Windows).

Type:

chmod +x bashfile.txt

This will make the “bashfile.txt” executable.

Follow up by typing:

nohup ./bashfile.txt

This will run the “bashfile.txt” and its contents. The output will be put into a file called “nohup.out”. The “nohup” option is preferred for very long scripts since it will keep running even if the Terminal closes (due to broken connection or computer problems).

Python Online Courses for Teenagers/Adults

If your child is interested in a Computer Science/Data Science career in the future, do consider learning Python beforehand. Computer Science is getting very popular in Singapore again. To see how popular it is, just check out the latest cut-off point for NUS computer science, it is close to perfect score (AAA/B) for A-levels.

According to many sources, the Singapore job market (including government sector) is very interested in skills like Machine Learning/ Deep Learning/Data Science. It seems that Machine Learning can be used to do almost anything and everything, from playing chess to data analytics. Majors such as accountancy and even law are in danger of being replaced by Machine Learning. Python is the key language for such applications.

I just completed a short course on Python: Python A-Z™: Python For Data Science With Real Exercises! The course fee is payable via Skillsfuture for Singaporeans, i.e. you don’t have to pay a single cent. (You have to purchase it first, then get a reimbursement from Skillsfuture.) At the end, you will get a Udemy certificate which you can put in your LinkedIn profile.

The course includes many things from the basic syntax to advanced visualization of data. It teaches at quite a basic level, I am sure most JC students (or even talented secondary students) with some very basic programming background can understand it.

The best programming language for data science is currently Python. Try not to learn “old” languages like C++ as it can become obsolete soon. Anyway the focus is on the programming structure, it is more or less universal across different languages.icon

Udemy URL: Python A-Z™: Python For Data Science With Real Exercises!

Related posts on Python: