python – Mathtuition88

House Robber Leetcode Solution Python

(Note: Robbing is against the law! 😉 The below is just the actual Leetcode question phrasing reproduced verbatim.)

You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security systems connected and it will automatically contact the police if two adjacent houses were broken into on the same night.

Given an integer array nums representing the amount of money of each house, return the maximum amount of money you can rob tonight without alerting the police.

Constraints:

1 <= nums.length <= 100
0 <= nums[i] <= 400

Leetcode Solution (Python)

class Solution:
    def rob(self, nums: List[int]) -> int:
        n = len(nums)
        # dp[i] max amount for nums[0:i]
        dp = [-1 for i in range(n+1)]
        
        # Recursive function that returns dp[i]
        def recursive(i):
            if i<=0:
                return 0
            if dp[i]!=-1:
                return dp[i]
            else:
                dp[i] = max(recursive(i-1), nums[i-1]+recursive(i-2))
            #print(dp)
            return dp[i]
        return recursive(n)

Cracking the Coding Interview: 189 Programming Questions and Solutions

Maximum Subarray Leetcode Solution Python

Given an integer array nums, find the subarray with the largest sum, and return its sum.

Leetcode Solution Python (Kadane’s Algorithm)

class Solution:
    def maxSubArray(self, nums: List[int]) -> int:
        # Kadane's Algorithm
        max_sum = float('-inf')
        curr_sum = 0
        for i in range(0,len(nums)):
            curr_sum = max(nums[i], curr_sum+nums[i])
            max_sum = max(curr_sum, max_sum)
        return max_sum

Cracking the Coding Interview: 189 Programming Questions and Solutions

Best Time to Buy and Sell Stock Leetcode Python Solution

You are given an array prices where prices[i] is the price of a given stock on the i^th day.

You want to maximize your profit by choosing a single day to buy one stock and choosing a different day in the future to sell that stock.

Return the maximum profit you can achieve from this transaction. If you cannot achieve any profit, return 0.

Leetcode Python Solution

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        rolling_min = prices[0]
        max_profit = 0
        
        for i in range(0,len(prices)):
            curr_profit = prices[i]-rolling_min
            if curr_profit>max_profit:
                max_profit = curr_profit
            if prices[i]<rolling_min:
                rolling_min = prices[i]
                
        return max_profit

Elements of Programming Interviews in Python: The Insiders’ Guide

Leetcode Climbing Stairs Solution (Python)

You are climbing a staircase. It takes n steps to reach the top.

Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

Constraints:

1 <= n <= 45

Solution (Python)

class Solution:
    # dp[i] denotes number of ways for stairs with i steps
    dp = [-1 for i in range(45+1)]
        
    def climbStairs(self, n: int) -> int:
        # Base case
        if n<=1:
            return 1
        if self.dp[n]!=-1:
            return self.dp[n]
        # Recursive step
        self.dp[n] = self.climbStairs(n-1) + self.climbStairs(n-2)
        return self.dp[n]

Elements of Programming Interviews in Python: The Insiders’ Guide

Longest Common Prefix Leetcode Solution Python

Write a function to find the longest common prefix string amongst an array of strings.

If there is no common prefix, return an empty string "".

Leetcode Solution (Python)

class Solution:
    def longestCommonPrefix(self, strs: List[str]) -> str:
        prefix = ''
        # Length of smallest string
        minlen = len(strs[0])
        for string in strs:
            if len(string)<minlen:
                minlen = len(string)
        
        for i in range(0,minlen):
            curr_char = strs[0][i]
            for string in strs:
                if string[i]!=curr_char:
                    return prefix
            prefix += curr_char
        
        return prefix

Elements of Programming Interviews in Python: The Insiders’ Guide

Implement strStr() Leetcode Solution Python

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Leetcode Solution (Python)

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n = len(needle)
        #print(needle)
        
        for i in range(0,len(haystack)-n+1):
            substr = haystack[i:i+n]
            #print(substr)
            if substr == needle:
                return i
            
        return -1

Elements of Programming Interviews in Python: The Insiders’ Guide

String to Integer (atoi) Leetcode Solution Python

Implement the myAtoi(string s) function, which converts a string to a 32-bit signed integer (similar to C/C++’s atoi function).

The algorithm for myAtoi(string s) is as follows:

Read in and ignore any leading whitespace.
Check if the next character (if not already at the end of the string) is '-' or '+'. Read this character in if it is either. This determines if the final result is negative or positive respectively. Assume the result is positive if neither is present.
Read in next the characters until the next non-digit character or the end of the input is reached. The rest of the string is ignored.
Convert these digits into an integer (i.e. "123" -> 123, "0032" -> 32). If no digits were read, then the integer is 0. Change the sign as necessary (from step 2).
If the integer is out of the 32-bit signed integer range [-2³¹, 2³¹ - 1], then clamp the integer so that it remains in the range. Specifically, integers less than -2³¹ should be clamped to -2³¹, and integers greater than 2³¹ - 1 should be clamped to 2³¹ - 1.
Return the integer as the final result.

Note:

Only the space character ' ' is considered a whitespace character.
Do not ignore any characters other than the leading whitespace or the rest of the string after the digits.

Leetcode Solution (Python)

class Solution:
    def myAtoi(self, s: str) -> int:
        # Positive sign
        positive = True
        result_str = ''
        digits = ['0','1','2','3','4','5','6','7','8','9']
        signs = ['+','-']
        
        # Has the first non-space been encountered
        started = False
        
        # Has the first sign been encountered
        signed = False
        
        for i in range(0,len(s)):
            if (signed == True or started == True) and s[i] in signs:
                break
            if s[i]!= ' ':
                started = True
                if s[i] in signs:
                    signed = True
                    if s[i]=='-':
                        positive = False
                elif s[i] in digits:
                    result_str += s[i]
            if started==True and s[i] not in digits and s[i] not in signs:
                break
            
            
        #print(result_str)
        if result_str=='':
            return 0
        else:
            result_int = int(result_str) if positive==True else -int(result_str)
            
        # Clamp Range:
        MIN_INT = -2**31
        MAX_INT = 2**31-1
        if result_int < MIN_INT:
            result_int = MIN_INT
        elif result_int > MAX_INT:
            result_int = MAX_INT
        
        return result_int

Elements of Programming Interviews in Python: The Insiders’ Guide

Valid Palindrome Leetcode Solution Python

A phrase is a palindrome if, after converting all uppercase letters into lowercase letters and removing all non-alphanumeric characters, it reads the same forward and backward. Alphanumeric characters include letters and numbers.

Given a string s, return true if it is a palindrome, or false otherwise.

Leetcode Solution (Python)

class Solution:
    def isPalindrome(self, s: str) -> bool:
        # To lower case
        s = s.lower()
        
        # Remove puncutation
        s = s.translate(str.maketrans('', '', string.punctuation))
        
        # Remove space
        s = s.replace(" ", "")
        
        # Two pointers
        l = 0
        r = len(s)-1
        while l<r:
            if s[l]!=s[r]:
                return False
            l+=1
            r-=1
        
        return True

Elements of Programming Interviews in Python: The Insiders’ Guide

Valid Anagram Leetcode Solution Python (Beats 99.45% of Submissions)

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Leetcode Solution (Python)

class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        # Set of characters in s,t
        charset_s = set(s)
        charset_t = set(t)
        
        # If the two sets differ, they are not anagrams
        if charset_s!=charset_t:
            return False
        
        for char in charset_s:
            # Count characters in s, t
            count_s = s.count(char)
            count_t = t.count(char)
            # If the counts differ, then they are not anagrams
            if count_s!= count_t:
                return False
        return True

Elements of Programming Interviews in Python: The Insiders’ Guide

First Unique Character in a String Leetcode Solution Python

Leetcode Question

Given a string s, find the first non-repeating character in it and return its index. If it does not exist, return -1.

Solution (Python)

class Solution:
    def firstUniqChar(self, s: str) -> int:
        chardict = {}
        for i in range(0,len(s)):
            #print(s[i])
            if s[i] in chardict:
                value = chardict[s[i]][0]
                index = chardict[s[i]][1]
                chardict.update({s[i]:[value+1,index]})
            else:
                chardict.update({s[i]:[1,i]})
        #print(chardict)
        ans = len(s)
        for key,value in chardict.items():
            if value[0]==1 and value[1]<ans:
                ans = value[1]
        if ans!=len(s):
            return ans
        return -1

Cracking the Coding Interview: 189 Programming Questions and Solutions

Leetcode Reverse Integer Python Solution

Leetcode Question

Given a signed 32-bit integer x, return x with its digits reversed. If reversing x causes the value to go outside the signed 32-bit integer range [-2³¹, 2³¹ - 1], then return 0.

Assume the environment does not allow you to store 64-bit integers (signed or unsigned).

Python Solution

class Solution:
    def reverse(self, x: int) -> int:
        MAX_INT = 2**31-1
        MIN_INT = -2**31
        result = 0
        while x!=0:
            if result>MAX_INT/10 or result<MIN_INT/10:
                return 0
            digit = x%10 if x>0 else x%-10
            #print('digit:',digit)
            x = math.trunc(x/10)
            #print('x:',x)
            result = result*10 + digit
        return result

Cracking the Coding Interview: 189 Programming Questions and Solutions

Reverse String Leetcode Python Solution

Leetcode Question

Write a function that reverses a string. The input string is given as an array of characters s.

You must do this by modifying the input array in-place with O(1) extra memory.

Python Solution

class Solution:
    def reverseString(self, s: List[str]) -> None:
        """
        Do not return anything, modify s in-place instead.
        """
        l = 0
        r = len(s)-1
        while l<r:
            s[l],s[r] = s[r],s[l]
            l+=1
            r-=1

Cracking the Coding Interview: 189 Programming Questions and Solutions

Rotate Image Leetcode Solution Python

Rotate Image Leetcode

You are given an n x n 2D matrix representing an image, rotate the image by 90 degrees (clockwise).

You have to rotate the image in-place, which means you have to modify the input 2D matrix directly. DO NOT allocate another 2D matrix and do the rotation.

Solution (Python)

class Solution:
    def rotate(self, matrix: List[List[int]]) -> None:
        """
        Do not return anything, modify matrix in-place instead.
        """
        n = len(matrix)
        # Transpose
        for i in range(0,n):
            for j in range(0,i):
                temp = matrix[i][j]
                matrix[i][j] = matrix[j][i]
                matrix[j][i] = temp
        #print(matrix)
        
        # Reflection (left-right, swap columns)
        for i in range(0,n):
            for j in range(0,n//2):
                temp = matrix[i][j]
                matrix[i][j] = matrix[i][n-j-1]
                matrix[i][n-j-1] = temp

Cracking the Coding Interview: 189 Programming Questions and Solutions

Valid Sudoku Leetcode Solution Python

Question

Determine if a 9 x 9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules:

Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the nine 3 x 3 sub-boxes of the grid must contain the digits 1-9 without repetition.

Note:

A Sudoku board (partially filled) could be valid but is not necessarily solvable.
Only the filled cells need to be validated according to the mentioned rules.

Solution (Python)

class Solution:
    def checkRep(self, checklist):
        #print(checklist)
        for i in range(1,9+1):
            count = checklist.count(str(i))
            #print(i,count)
            if count>1:
                return False
        else:
            return True
        
    def getSquare(self,i_start,j_start,board):
        square_list = []
        for i in range(i_start,i_start+3):
            for j in range(j_start,j_start+3):
                square_list.append(board[i][j])
        return square_list
        
    def isValidSudoku(self, board: List[List[str]]) -> bool:
        # Check Rows
        for i in range(0,9):
            if self.checkRep(board[i])==False:
                return False
        
        # Check Columns
        for j in range(0,9):
            if self.checkRep([x[j] for x in board])==False:
                return False
        
        # Check Squares
        square_coords = []
        for i in range(0,3):
            for j in range(0,3):
                square_coords.append([i*3,j*3])
        #print(square_coords)
        for coord in square_coords:
            square_list = self.getSquare(coord[0],coord[1],board)
            if self.checkRep(square_list)==False:
                return False
        return True

Cracking the Coding Interview: 189 Programming Questions and Solutions

Move Zeroes Leetcode Python Solution

Question

Given an integer array nums, move all 0‘s to the end of it while maintaining the relative order of the non-zero elements.

Note that you must do this in-place without making a copy of the array.

Python Solution

class Solution:
    def moveZeroes(self, nums: List[int]) -> None:
        """
        Do not return anything, modify nums in-place instead.
        """
        i=0
        num_zeros = 0
        while i<len(nums):
            #print(i)
            if i<=len(nums)-num_zeros:
                if nums[i]==0:
                    nums.pop(i)
                    nums.append(0)
                    num_zeros+=1
                else:
                    i+=1
            else:
                break

Cracking the Coding Interview: 189 Programming Questions and Solutions

Plus One Leetcode Solution Python

Question

You are given a large integer represented as an integer array digits, where each digits[i] is the i^th digit of the integer. The digits are ordered from most significant to least significant in left-to-right order. The large integer does not contain any leading 0‘s.

Increment the large integer by one and return the resulting array of digits.

Solution (Python)

class Solution:
    def plusOne(self, digits: List[int]) -> List[int]:
        curr_index = len(digits)-1
        while curr_index>0:
            if digits[curr_index]<9:
                digits[curr_index] = digits[curr_index]+1
                return digits
            else:
                digits[curr_index] = 0
                curr_index -=1
        if curr_index==0:
            if digits[0]==9:
                digits[0]=0
                digits = [1] + digits
                return digits
            else:
                digits[0] = 1 + digits[0]
                return digits

Cracking the Coding Interview: 189 Programming Questions and Solutions

Intersection of Two Arrays II Leetcode Solution Python

Question

Given two integer arrays nums1 and nums2, return an array of their intersection. Each element in the result must appear as many times as it shows in both arrays and you may return the result in any order.

Solution (Python)

class Solution:
    def intersect(self, nums1: List[int], nums2: List[int]) -> List[int]:
        len1 = len(nums1)
        len2 = len(nums2)
        smaller = nums1 if len1<=len2 else nums2
        larger = nums1 if len1>len2 else nums2
        #print(smaller)
        final_list = []
        for num in smaller:
            try:
                idx = larger.index(num)
                larger.pop(idx)
                final_list.append(num)
            except ValueError:
                pass
        return final_list

Cracking the Coding Interview: 189 Programming Questions and Solutions

Single Number Leetcode Solution Python

Question

Given a non-empty array of integers nums, every element appears twice except for one. Find that single one.

You must implement a solution with a linear runtime complexity and use only constant extra space.

Solution (Python)

class Solution:
    def singleNumber(self, nums: List[int]) -> int:
        xor = 0
        for num in nums:
            xor ^= num
        # a^a = 0 (each number a, when xor with itself is 0)
        # 0^b = b (hence, the final output will return the element which appears only once)
        return xor

Cracking the Coding Interview: 189 Programming Questions and Solutions

Contains Duplicate Leetcode Solution Python

Question

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Solution (Python)

class Solution:
    def containsDuplicate(self, nums: List[int]) -> bool:
        nums_set = set(nums)
        if len(nums)==len(nums_set):
            return False
        else:
            return True

Cracking the Coding Interview: 189 Programming Questions and Solutions

Rotate Array Leetcode Python Solution

Question

Given an integer array nums, rotate the array to the right by k steps, where k is non-negative.

Python Solution

class Solution:
    def rotate(self, nums: List[int], k: int) -> None:
        """
        Do not return anything, modify nums in-place instead.
        """
        if k==0:
            return
        if k>=len(nums):
            k = k % len(nums)
        print('k:',k)
        right_list = nums[-k:] if k>0 else []
        left_list = nums[0:len(nums)-k]
        print('right_list:',right_list)
        print('left_list:',left_list)
        nums[:] = right_list + left_list

Cracking the Coding Interview: 189 Programming Questions and Solutions

Best Time to Buy and Sell Stock II Solution Python

Question

You are given an integer array prices where prices[i] is the price of a given stock on the i^th day.

On each day, you may decide to buy and/or sell the stock. You can only hold at most one share of the stock at any time. However, you can buy it then immediately sell it on the same day.

Find and return the maximum profit you can achieve.

Solution (Python)

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        profit = 0
        for i in range(0,len(prices)-1):
            if prices[i]<prices[i+1]:
                profit += prices[i+1]-prices[i]
        return profit

Cracking the Coding Interview: 189 Programming Questions and Solutions

Remove Duplicates From Sorted Array Leetcode Solution in Python

Question

Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Then return the number of unique elements in nums.

Consider the number of unique elements of nums to be k, to get accepted, you need to do the following things:

Change the array nums such that the first k elements of nums contain the unique elements in the order they were present in nums initially. The remaining elements of nums are not important as well as the size of nums.
Return k.

Solution (Python)

class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:
        final_list = []
        for num in nums:
            if num not in final_list:
                final_list.append(num)
        nums[:] = final_list
        k = len(nums)
        return k

Cracking the Coding Interview: 189 Programming Questions and Solutions

Spyder Typing Delay

Recently, the Spyder IDE faced serious typing delay or lag. Basically, after entering something using the keyboard, it takes almost half a second for it to appear on the screen. The issue can also be described as a typing lag in Spyder.

This seems to have been triggered by the installation of Big Sur OS.

Spyder slow on Mac

After intensive googling, we stumbled upon this Github page detailing several possible solutions.

The one that worked for us was installing pyqt and pyqtwebengine. Basically, type the following commands in the terminal:

pip install PyQt5
pip install PyQtWebEngine

The above solution should be very safe since it is just installing Python packages.

Spyder lagging

The above solution helped to solve the troubling issue of Spyder lagging. Since Spyder uses Qt for its GUI, it is critical to keep the various Qt related packages updated / at the correct version. This may be the reason why installing PyQt5 and PyQtWebEngine helps to remove the lag in Spyder.

Spyder very slow

There seems to be many reasons, other than the above, that can result in Spyder being very slow. One tip that is useful, is never update to the latest version of Spyder, Mac OS, or Anaconda immediately once it is released, unless it is absolutely necessary. Most of the bugs appear in the newest releases, and can cause multiple problems including making Spyder very slow. By updating at a later date, most of the bugs would have been solved by then and it is a much safer approach.

Previously, updating to the latest Spyder 4.1.5 also caused several problems, including lag, slowness or even Spyder simply just crashing.

Python matplotlib Plot Multiple Figures in Separate Windows

Matplotlib is a popular plotting package used in Python. There are some things to note for plotting multiple figures, in separate windows.

A wrong approach may lead to matplotlib showing a black screen, or plotting two figures superimposed on each other, which may not be the desired outcome.

Sample Matplotlib Code for Plotting Multiple Figures in Separate Windows

import matplotlib.pyplot as plt

plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename1')


plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename2')

plt.show()

One way to do this is to use the plt.figure() command for each figure that you want to plot separately. Optionally, you can use plt.savefig() if you wish to save the figure plotted to the working directory folder.

At the end, use the plt.show() command. The plt.show() command should only be used once per script.

Updating Spyder takes forever

Spyder is a Python IDE that is bundled together with the Anaconda distribution.

There are some problems that are commonly faced when it comes to updating Spyder. One way to update Spyder is to open Anaconda Navigator and click the settings button which has an option to update Spyder. But the problem is that the process can take a very long time. The process shows that it is “loading packages of /User/…/opt/anaconda3”.

Updating Spyder is constricted by …

Another way to update Spyder is to type “conda update spyder” in the terminal. A problem that can crop up is the error message: “updating spyder is constricted by …”

Anaconda stuck updating Spyder [Solved]

For my case, it turns out that the version of Anaconda Navigator is outdated. Hence, I first updated Anaconda Navigator to the latest version.

Then, instead of clicking “Update application” which still didn’t quite work, we click on “Install specific version” and choose the latest version of Spyder (Spyder 4.1.5 in this case).

Then, the updating of Spyder in Anaconda Navigator worked perfectly!

How to update Spyder using Anaconda-Navigator: Click “Install specific version” instead of “Update application”.

How to speed up R code in RStudio

I just found out by trial and error that the suppressing of print statements in RStudio greatly speeds up the R code.

In my case, code that was originally estimated to take around 40 hours to run, just ran in under an hour after I suppressed all the print statements in the for loops.

This is supported by evidence in other forums, for example in StackOverflow: R: Does the use of the print function inside a for loop slow down R?

Basically, if your code prints too much output to the console, it will slow down RStudio and your R code as well. It may be due to all the output clogging up the memory in RStudio. R is known to be “single thread” so it can only use 1 CPU at a time, even if your computer has multiple cores.

Hence, the tips are to:

Reduce the number of print statements in the code manually.
Set quiet=TRUE in all scan statements. Basically, the default behavior is that scan() will print a line, saying how many items have been read.

This is especially true with for loops, since the amount of printed output can easily number to the millions, and overwhelm RStudio.

How to keep Python / R Running on Mac (without screen lock or sleep)

When the Mac (or MacBook) is running for a long time, it is very liable to do one of the following things:

sleep
screen saver
lock screen

The problem is that your Python program or R program running in the background will most likely stop completely. Sure, it can resume when you activate the Mac again, but that is not what most people want! For one, it may impact the accurate calculation of elapsed time of your Python code.

Changing settings via System Preferences -> Energy Saver is a possible solution, but it is troublesome and problematic:

Have to switch it on and off again when not in use (many steps).
Preventing sleep may still run into screen saver, screen lock, etc.
Vice versa, preventing screen lock may still run into Mac sleeping, etc.

The solution is to install this free App called Amphetamine. Despite its “drug” name, it is a totally legitimate program that has high reviews everywhere. What this app does is to prevent your Mac from stopping, locking or sleeping. Hence, whatever program you are running will not halt till the program is done (or when you switch off Amphetamine).

It is a great program that does its job well! Highly recommended for anyone doing programming, video editing or downloading large files on Mac.

Best way to time algorithms in Python

There are tons of ways to calculate elapsed time (in seconds) for Python code. But which is the best way?

So far, I find that the “timeit” method seems to give good results, and is easy to implement. Source: https://stackoverflow.com/questions/7370801/measure-time-elapsed-in-python

Use timeit.default_timer instead of timeit.timeit. The former provides the best clock available on your platform and version of Python automatically:
from timeit import default_timer as timer

start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282
This is the answer by the user “jfs” on Stack Overflow.

Benefits of the above method include:

Using timeit will produce far more accurate results since it will automatically account for things like garbage collection and OS differences (comment by user “lkgarrison”)

Please comment below if you know other ways of measuring elapsed time on Python!

Other methods include:

time.clock() (Deprecated as of Python 3.3)
time.time() (Is this a good method?)
time.perf_counter() for system-wide timing,
or time.process_time() for process-wide timing

Python (Anaconda) does not work with MacOS Catalina!

This is just to highlight that the Anaconda Python Distribution does not work with the latest MacOS Catalina. I only realized upon trying to open Anaconda Navigator, after installing Catalina.

The only (good) solution seems to be reinstalling Anaconda.

Source: https://www.anaconda.com/how-to-restore-anaconda-after-macos-catalina-update/

MacOS Catalina was released on October 7, 2019, and has been causing quite a stir for Anaconda users. Apple has decided that Anaconda’s default install location in the root folder is not allowed. It moves that folder into a folder on your desktop called “Relocated Items,” in the Security folder. If you’ve used the .pkg installer for Anaconda, this probably broke your Anaconda installation. Many users discuss the breakage at https://github.com/ContinuumIO/anaconda-issues/issues/10998.

Python save csv to folder

In Python (pandas), saving a .csv file to a particular folder is not that hard, but then it may be confusing to beginners.

The packages we need to import are:

import pandas as pd
import os.path

Say, your folder name is called “myfolder”, and the dataframe you have is called “df”. To save it insider “myfolder” as “yourfilename.csv”, the following code does the job:

df.to_csv(os.path.join('myfolder','yourfilename.csv'))

The reason this may be difficult for beginners is that beginners may not know of the existence of the os.path.join method, which is the recommended method for joining one or more path components.

Popular packages in R and Python for Data Science

Most of the time, users of R and Python will rely on packages and libraries as far as possible, in order to avoid “reinventing the wheel”. Packages that are established are also often superior and preferred, due to lower chance of errors and bugs.

We list down the most popular and useful packages in R and Python for data science, statistics, and machine learning.

Packages in R

arules
arulesViz
car
caret
cluster
corrplot
ggplot2
lattice
perturb
psych
readr
recommenderlab
reshape2
ROCR
rpart
rpart.plot
tidyverse

Python Packages

factor_analyzer
math
matplotlib
numpy
pandas
scipy
seaborn
sklearn
statsmodels

pip install keeps installing old/outdated packages

This article is suitable for solving the following few problems:

module ‘sklearn.tree’ has no attribute ‘plot_tree’
pip install (on Spyder, Anaconda Prompt, etc.) does not install the latest package.

The leading reason for “module ‘sklearn.tree’ has no attribute ‘plot_tree” is because the sklearn package is outdated.

Sometimes “pip install scikit-learn” simply does not update the sklearn package to the latest version. Type “print(sklearn.__version__)” to get the version of sklearn on your machine, it should be at least 0.21.

The solution is to force pip to install the latest package:

pip install --no-cache-dir --upgrade <package>

In this case, we would replace <package> by “scikit-learn”.

Sometimes, pip install does not work in the Spyder IPython console, it displays an error to the effect that you should install “outside the IPython console”. This is not normal (i.e. it should not happen), but as a quick fix you can try “pip install” in Anaconda Prompt instead. It is likely that something wrong went on during the installation of Anaconda, Python, and the long-term solution is to reinstall Anaconda.

caret package in R: known issue when converting factor variables

In the R language, often you have to convert variables to “factor” or “categorical”. There is a known issue in the ‘caret’ library that may cause errors when you do that in a certain way.

The correct way to convert variables to ‘factor’ is:

trainset$Churn = as.factor(trainset$Churn)

In particular, “the train() function in caret does not handle factor variables well” when you convert to factors using other methods.
(See https://rpubs.com/SulmanKhan/444033)

Basically, if you use other ways to convert to ‘factor’, the code may still run, but there may be some ‘weird’ issues that leads to inaccurate predictions (for instance if you are doing logistic regression, decision trees, etc.)

How to save sklearn tree plot as file (Vector Graphics)

The Scikit-Learn (sklearn) Python package has a nice function sklearn.tree.plot_tree to plot (decision) trees. The documentation is found here.

However, the default plot just by using the command

tree.plot_tree(clf)

could be low resolution if you try to save it from a IDE like Spyder.

The solution is to first import matplotlib.pyplot:

import matplotlib.pyplot as plt

Then, the following code will allow you to save the sklearn tree as .eps (or you could change the format accordingly):

plt.figure()
tree.plot_tree(clf,filled=True)  
plt.savefig('tree.eps',format='eps',bbox_inches = "tight")

To elaborate, clf is your Decision Tree classifier (to be defined before plotting the tree):

# Example from https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
clf = tree.DecisionTreeClassifier(random_state=0)
clf = clf.fit(iris.data, iris.target)

The outcome is a Vector Graphics format (.eps) tree that will retain its full resolution when zoomed in. The bbox_inches=”tight” command prevents truncating of the image. Without that command, sometimes the sklearn tree will just be cropped off and be incomplete.

Calculate Cronbach Alpha using Python

R has the package “psych” which allows one to calculate the Cronbach’s alpha very easily just by one line:

psych::alpha(your_data, column_list)

For Python, the situation is more tricky since there does not seem to exist any package for calculating Cronbach’s alpha. Fortunately, the formula is not very complicated and it can be calculated in a few lines.

An existing code can be found on StackOverflow, but it has some small “bugs”. The corrected version is:

def CronbachAlpha(itemscores):
    itemscores = np.asarray(itemscores)
    itemvars = itemscores.var(axis=0, ddof=1)
    tscores = itemscores.sum(axis=1)
    nitems = itemscores.shape[1]

    return (nitems / (nitems-1)) * (1 - (itemvars.sum() / tscores.var(ddof=1)))

The input “itemscores” can be your Pandas DataFrame or any numpy array. (Note that this method requires you to “import numpy as np”).

Python code for PCA Rotation “varimax” matrix

The R programming language has an excellent package “psych” that Python has no real equivalent of.

For example, R can do the following code using the principal() function:

principal(r=dat, nfactors=num_pcs, rotate="varimax")

to return the “rotation matrix” in principal component analysis based on the data “dat” and the number of principal components “num_pcs”, using the “varimax” method.

The closest equivalent in Python is to first use the factor_analyzer package:

from factor_analyzer import FactorAnalyzer

Then, we use the following code to get the “rotation matrix”:

fa = FactorAnalyzer(n_factors=3, method='principal', rotation="varimax")
fa.fit(dat)
print(fa.loadings_.round(2))

How to write Bash file to run multiple Python scripts simultaneously

Step 1 is to create a Bash file (using any editor, even Notepad). Sample code:

#!/usr/bin/env bash
python testing.py &
python testingb.py &

The above code will run two Python files “testing.py” and “testingb.py” simultaneously. Add more python scripts if needed. The first line is called the “shebang” and signifies the computer to run bash (there are various versions but according to StackOverflow the above one is the best).

The above bash file can be saved to any name and any extension, say “bashfile.txt”.

Step 2 is to login to Terminal (Mac) or Putty (Windows).

Type:

chmod +x bashfile.txt

This will make the “bashfile.txt” executable.

Follow up by typing:

nohup ./bashfile.txt

This will run the “bashfile.txt” and its contents. The output will be put into a file called “nohup.out”. The “nohup” option is preferred for very long scripts since it will keep running even if the Terminal closes (due to broken connection or computer problems).

Python Online Courses for Teenagers/Adults

If your child is interested in a Computer Science/Data Science career in the future, do consider learning Python beforehand. Computer Science is getting very popular in Singapore again. To see how popular it is, just check out the latest cut-off point for NUS computer science, it is close to perfect score (AAA/B) for A-levels.

According to many sources, the Singapore job market (including government sector) is very interested in skills like Machine Learning/ Deep Learning/Data Science. It seems that Machine Learning can be used to do almost anything and everything, from playing chess to data analytics. Majors such as accountancy and even law are in danger of being replaced by Machine Learning. Python is the key language for such applications.

I just completed a short course on Python: Python A-Z™: Python For Data Science With Real Exercises! The course fee is payable via Skillsfuture for Singaporeans, i.e. you don’t have to pay a single cent. (You have to purchase it first, then get a reimbursement from Skillsfuture.) At the end, you will get a Udemy certificate which you can put in your LinkedIn profile.

The course includes many things from the basic syntax to advanced visualization of data. It teaches at quite a basic level, I am sure most JC students (or even talented secondary students) with some very basic programming background can understand it.

The best programming language for data science is currently Python. Try not to learn “old” languages like C++ as it can become obsolete soon. Anyway the focus is on the programming structure, it is more or less universal across different languages.

Udemy URL: Python A-Z™: Python For Data Science With Real Exercises!

import matplotlib on PyCharm: Quick 3 min solution

Students trying to import the package “matplotlib” on PyCharm will soon face the cryptic error message: “Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework.”

It is extremely puzzling what to do. I have researched the steps that you can follow to solve it in 3 minutes:

First you need to install the “matplotlib” on PyCharm following the instructions here: https://stackoverflow.com/questions/21883768/pycharm-and-external-libraries

Then, to import matplotlib, you need the following lines:

import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt

Done! You are ready to use matplotlib on PyCharm interpreter, by using:

plt.plot(...)
plt.show()

Python Math Programming

Recently, I am thinking of learning the Python language for Math programming.

An advantage for using Python for Math Programming (e.g. testing out some hypothesis about numbers), is that the Python programming language theoretically has no largest integer value that it can handle. It can handle integers as large as your computer memory can handle. (Read more at: http://userpages.umbc.edu/~rcampbel/Computers/Python/numbthy.html)

Other programming languages, for example Java, may have a maximum integer value beyond which the program starts to fail. Java integers can only have a maximum value of $2^{31}-1 \approx 2.15 \times 10^9$ , which is pretty limited if you are doing programming with large numbers (for example over a trillion). For instance, the seventh Fermat number is already 18446744073709551617. I was using Java personally until recently I needed to program larger integers to test out some hypothesis.

How to install Python (free):

Hope this is a good introduction for anyone interested in programming!

Featured book:

Learning Python, 5th Edition

Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on author Mark Lutz’s popular training course, this updated fifth edition will help you quickly write efficient, high-quality code with Python. It’s an ideal way to begin, whether you’re new to programming or a professional developer versed in other languages.

Complete with quizzes, exercises, and helpful illustrations, this easy-to-follow, self-paced tutorial gets you started with both Python 2.7 and 3.3— the latest releases in the 3.X and 2.X lines—plus all other releases in common use today. You’ll also learn some advanced language features that recently have become more common in Python code.

Explore Python’s major built-in object types such as numbers, lists, and dictionaries
Create and process objects with Python statements, and learn Python’s general syntax model
Use functions to avoid code redundancy and package code for reuse
Organize statements, functions, and other tools into larger components with modules
Dive into classes: Python’s object-oriented programming tool for structuring code
Write large programs with Python’s exception-handling model and development tools
Learn advanced Python tools, including decorators, descriptors, metaclasses, and Unicode processing