data science – Mathtuition88

From H2 Probability to “Quant”: What JC Maths Actually Shows Up in Finance

From H2 Probability to “Quant”: What JC Maths Actually Shows Up in Finance

Many Junior College students first hear the word “quant” from university fairs, YouTube, links shared in group chats, or friends already aiming for finance and technology. In everyday speech it is shorthand for quantitative finance: work where mathematics, statistics, and programming meet markets, risk, and pricing. From the perspective of someone staring at this week’s probability tutorial, that world can feel distant or even intimidating.

It is not as distant as it sounds. A large part of the vocabulary is already present in H2 Mathematics, especially the probability and statistics strand. This article offers an honest map of what transfers cleanly, what changes when you leave the exam hall, and how to keep your priorities straight while you explore.

Singapore students are used to a demanding rhythm. Classroom coverage can sit below the difficulty of competitive papers, so disciplined practice matters if you want reliability under time pressure. The same habit serves you when you read optional material about careers. Reading about quant roles should not replace past papers. It can sit beside them as motivation, context, and a reason to take your tutorial work seriously rather than treating it as isolated drill.

What people mean by “quant”

There is no single job titled “quant” everywhere. People use the term for roles that build or use mathematical models. Examples include pricing derivatives, measuring portfolio risk, designing systematic trading signals, stress testing balance sheets, or supporting data-heavy investing and execution. Some quants write production code every day. Others live closer to research, prototyping, and internal tools. Buy-side and sell-side cultures differ, and so do the mixes of mathematics, statistics, software engineering, and communication skills.

What those paths tend to share is comfort with precise reasoning when outcomes are uncertain. That is exactly the skill your better H2 probability questions reward. You define the sample space, assign probabilities consistently, compute summaries, and interpret the result without hand-waving. If you enjoy that clarity, you already understand one reason firms hire mathematical backgrounds even when the financial details come later.

Probability and counting

Combinatorial arguments and finite probability spaces are more than exam staples. They are the grammar of simple models used to compare scenarios and to sanity-check stories that sound plausible until you write them down carefully.

When you enumerate cases, insist probabilities sum to one, check whether events are independent or mutually exclusive, and avoid double counting, you are practising the same discipline that appears in the earliest financial tree models, basic scenario grids, and simple stress tests. You do not need to care about finance to benefit from the reflex that sloppy counting leads to sloppy conclusions.

Conditional probability also deserves a mention. Exam questions train you to update beliefs when new information arrives. In applied settings, people argue about the right conditioning information, but the formal idea that probabilities change when the reference event changes is everywhere once models interact with data feeds, partial observations, and hierarchical risk factors.

Expectation, variance, and how finance borrows the language

Exam papers train you to work with random variables: expectation, variance, linearity of expectation, and rules for sums and scaling. You learn to recognise when a decomposition simplifies a calculation and when independence lets variances add in a clean way.

In finance, people often summarise uncertain returns using related language: expected return and volatility, typically tied to standard deviation in introductory discussions. The distributions are not always the ones in your tutorial, and professionals argue constantly about which model fits which asset class, horizon, and regime.

The transferable lesson is structural. Mean and spread are ways to compress a complicated random outcome into something actionable, provided everyone remembers what was assumed and what was ignored. Your syllabus trains you to compute those summaries. Industry often asks you to argue whether the summary is appropriate, stable out-of-sample, and honest about tail risk. That second step is new, but the mathematical objects are familiar.

Distributions you already know, wearing different clothes

The Binomial distribution is a standard part of JC probability. In introductory mathematical finance, binomial trees reuse the same branching intuition. Each period, the world splits into branches with stated probabilities, and you work backwards from future payoffs to a value today. It is a deliberately simplified picture of option pricing, but it is an excellent example of exam mathematics connecting to a workflow people actually teach in finance courses.

You can view it as a disciplined answer to a question students already understand: if upside and downside moves happen with stated probabilities, how do we aggregate uncertainty across steps and translate a future random payoff into a present value under stated rules? Even if you never study finance, the habit of tracking probability mass through a tree is useful preparation for any field that models sequential uncertainty.

The Normal distribution appears everywhere in introductory statistics and often as an approximation when many small shocks add up. You will hear Normal assumptions in basic models of returns. They are convenient and famously imperfect in crises, when correlation spikes and extreme moves cluster.

Again, the JC skill that carries over is not memorising slogans. It is asking what assumption is being made, what breaks when tails matter more than the bell curve allows, and what data might falsify a comforting model.

Where your syllabus includes inference (confidence intervals, hypothesis tests, basic regression ideas), the transferable habit is statistical humility. A noisy sample is not truth. That caution appears whenever someone backtests a strategy on a short window, reports a risk number from limited history, or treats a statistically significant backtest as automatic proof of edge.

What simulation has to do with your lecture notes

Monte Carlo methods sound fancy, but the core idea is modest. You specify a model, draw random outcomes many times, and summarise the distribution of results. That connects directly to the intuition behind long-run averages, variance as spread, and the fact that estimates stabilise as sample size grows when assumptions hold.

You do not need to implement anything at JC level to benefit from the conceptual link. If you understand why repeated sampling produces stable empirical frequencies in well-behaved settings, you understand why simulation is a standard engineering approach when a closed form is messy but the generative story is clear.

If you want a lightweight interactive illustration, Quantt hosts a Monte Carlo simulator that lets you explore repeated random sampling without committing to a whole textbook side quest.

What H2 does not finish for you

Universities and hiring processes usually expect more than JC core. Typical gaps include programming fluency, linear algebra at a higher level for many routes, time series, numerical methods, optimisation, and domain knowledge about markets, instruments, and conventions. None of that removes the value of H2 probability. It clarifies the division of labour. School gives you a clean conceptual skeleton. Later work adds muscle, messy data, software constraints, and the need to explain assumptions to non-specialists.

There is also a culture gap. Exams reward correct answers under fixed rules. Professional settings reward robustness, documentation, and scepticism about models when incentives push people toward overconfidence. That is not an argument against exams. It is an argument for keeping your mathematical habits intact after grades stop being the only scoreboard.

If you want to see how roles are labelled and what employers discuss in practice, browsing a structured jobs-oriented overview can make the jargon less mysterious. Quantt maintains a quant finance jobs section for that kind of context.

Exploring without derailing A Levels

Curiosity is healthy. Timetable discipline is non-negotiable. Keep your primary effort on mastering the syllabus you will be graded on, especially if you are pushing for competitive papers where speed and accuracy compound.

A practical rule is to treat enrichment like revision spacing. Ten focused minutes after you finish a problem set beats an unfocused hour that interrupts sleep. If you read one external article, write down three precise questions it answered and one precise question it did not. That keeps reading tied to thinking rather than browsing.

When articles mention unfamiliar terms, a short glossary beats guessing from context and accidentally learning the wrong definition. Quantt publishes a glossary of common quant and finance vocabulary.

A note for parents and counsellors

Students exploring careers sometimes receive contradictory advice: specialise early, keep options open, chase prestige, chase passion. Quantitative finance is one pathway among many that reward strong mathematics. It is not the only pathway, and it is not a moral verdict on anyone’s worth if they prefer different fields.

What matters at JC stage is sustainable effort, honest diagnosis of weak topics, and enough sleep to consolidate learning. Optional reading should support those basics, not compete with them.

Closing note

H2 Mathematics exists to train rigorous thinking under explicit rules. Probability becomes powerful when it is treated as a language for uncertainty, not as magic and not as a bundle of formulas to recite under stress.
Quantitative finance is one of several directions where that language appears. It is not the only worthwhile destination, and it should never compete with your immediate exam goals. If you want one place that ties careers, tools, and learning resources together, Quantt is aimed at people exploring quantitative finance in a serious way.

Spyder Typing Delay

Recently, the Spyder IDE faced serious typing delay or lag. Basically, after entering something using the keyboard, it takes almost half a second for it to appear on the screen. The issue can also be described as a typing lag in Spyder.

This seems to have been triggered by the installation of Big Sur OS.

Spyder slow on Mac

After intensive googling, we stumbled upon this Github page detailing several possible solutions.

The one that worked for us was installing pyqt and pyqtwebengine. Basically, type the following commands in the terminal:

pip install PyQt5
pip install PyQtWebEngine

The above solution should be very safe since it is just installing Python packages.

Spyder lagging

The above solution helped to solve the troubling issue of Spyder lagging. Since Spyder uses Qt for its GUI, it is critical to keep the various Qt related packages updated / at the correct version. This may be the reason why installing PyQt5 and PyQtWebEngine helps to remove the lag in Spyder.

Spyder very slow

There seems to be many reasons, other than the above, that can result in Spyder being very slow. One tip that is useful, is never update to the latest version of Spyder, Mac OS, or Anaconda immediately once it is released, unless it is absolutely necessary. Most of the bugs appear in the newest releases, and can cause multiple problems including making Spyder very slow. By updating at a later date, most of the bugs would have been solved by then and it is a much safer approach.

Previously, updating to the latest Spyder 4.1.5 also caused several problems, including lag, slowness or even Spyder simply just crashing.

Python matplotlib Plot Multiple Figures in Separate Windows

Matplotlib is a popular plotting package used in Python. There are some things to note for plotting multiple figures, in separate windows.

A wrong approach may lead to matplotlib showing a black screen, or plotting two figures superimposed on each other, which may not be the desired outcome.

Sample Matplotlib Code for Plotting Multiple Figures in Separate Windows

import matplotlib.pyplot as plt

plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename1')


plt.figure()
# plotting function here, e.g. plt.hist()
plt.savefig('filename2')

plt.show()

One way to do this is to use the plt.figure() command for each figure that you want to plot separately. Optionally, you can use plt.savefig() if you wish to save the figure plotted to the working directory folder.

At the end, use the plt.show() command. The plt.show() command should only be used once per script.

Laptop for data science and machine learning

We recommend and review some laptops/notebooks suitable for data science and machine learning.

Firstly, we state the 4 core important specs (specifications) for laptops for data science and machine learning.

1) RAM: Should be 8GB or higher.

2) CPU: Should be Intel Core i5, or even better Intel Core i7.

3) Storage: Should be at least 256 GB SSD.

4) GPU: Optional, but good to have for deep learning. (Note that Mac GPUs are not usable for deep learning.)

RAM and CPU are necessary in order to conduct computations successfully and within a reasonable time. Storage is necessary to process big data which can potentially be several gigabytes. GPU is useful to speed up deep learning, but is optional since there are many cloud or server options available.

For concrete recommendations, we recommend the following 5 laptops. Certainly, there could be other laptops that are equally suitable for data science and machine learning, do comment below if you have a good suggestion!

1)

ASUS ROG Strix Scar 17 Gaming Laptop, 17.3” 300Hz FHD IPS Type, NVIDIA GeForce RTX 2070 Super, Intel Core i7-10875H, 16GB DDR4, 1TB PCIe SSD, Per-Key RGB Keyboard, Wi-Fi 6, Windows 10, G732LWS-DS76

2)

Dell XPS 15 7590 Laptop 15.6 inch, 4K UHD OLED InfinityEdge, 9th Gen Intel Core i7-9750H, NVIDIA GeForce GTX 1650 4GB GDDR5, 256GB SSD, 16GB RAM, Windows 10 Home, XPS7590-7572SLV-PUS, 15-15.99 inches

3)
New Apple MacBook Pro (16-inch, 16GB RAM, 512GB Storage, 2.6GHz Intel Core i7) – Space Gray

4)

2020 Lenovo ThinkPad T590 15.6″ FHD Full HD (1920×1080) Business Laptop (Intel Quad-Core i7-8565U, 16GB RAM, 512GB SSD) Backlit, Type-C Thunderbolt 3, RJ-45, Webcam, Windows 10 Pro IST Computers

5)
HP Pavilion 15-inch Laptop, Intel Core i7, 16 GB RAM, 512 GB SSD Storage, Intel Iris Plus Graphics, Windows 10 Pro, Amazon Alexa Voice Compatible (15-cs3019nr, Mineral Silver)

Updating Spyder takes forever

Spyder is a Python IDE that is bundled together with the Anaconda distribution.

There are some problems that are commonly faced when it comes to updating Spyder. One way to update Spyder is to open Anaconda Navigator and click the settings button which has an option to update Spyder. But the problem is that the process can take a very long time. The process shows that it is “loading packages of /User/…/opt/anaconda3”.

Updating Spyder is constricted by …

Another way to update Spyder is to type “conda update spyder” in the terminal. A problem that can crop up is the error message: “updating spyder is constricted by …”

Anaconda stuck updating Spyder [Solved]

For my case, it turns out that the version of Anaconda Navigator is outdated. Hence, I first updated Anaconda Navigator to the latest version.

Then, instead of clicking “Update application” which still didn’t quite work, we click on “Install specific version” and choose the latest version of Spyder (Spyder 4.1.5 in this case).

Then, the updating of Spyder in Anaconda Navigator worked perfectly!

How to update Spyder using Anaconda-Navigator: Click “Install specific version” instead of “Update application”.

Best Udemy Data Science / Machine Learning / AI Courses

During this current lockdown period it is a good idea to pick up a data science skill. Most occupations can benefit from such a skill, including engineers, accountants, teachers, even students. Who knows, one day you may find deep learning useful!

In this page we introduce various Udemy courses (which come with certificates that you can put on your LinkedIn profile) that are the best in their class, be it for data science, machine learning (including deep learning), and AI (Artificial Intelligence).

Best Udemy Python Course

Currently, Python is the most popular language for data science and machine learning. R is the second most popular language, and is especially good for statistics.

Hence, this Machine Learning A-Z™: Hands-On Python & R In Data Science Course is perfect as it introduces two of the most popular programming languages in one course! You will learn Machine Learning (ML) in the process as well, which is a great bonus.

If you only want to focus on Python, then check out 2020 Complete Python Bootcamp: From Zero to Hero in Python. It is designed to bring you from zero knowledge to a respectable expert in Python if you complete the course and exercises.

Best Udemy courses for data science

In the Python for Data Science and Machine Learning Bootcamp course, students can learn how to use NumPy, Pandas, Seaborn, Matplotlib, Plotly, Scikit-Learn, Machine Learning, Tensorflow, and more! The aforementioned packages are all classic and popular in data science, data analysis and data visualization.

The Data Science Course 2020: Complete Data Science Bootcamp is another bootcamp style course that gives you complete Data Science training in: Mathematics, Statistics, Python, Advanced Statistics in Python, Machine & Deep Learning. It is especially suitable for beginners, as well as intermediate students who need to brush up on their skills.

Best Udemy course for Deep Learning

Deep learning (DL) is a subbranch of machine learning that is recently very hot and popular due to its superior accuracy in tasks such as image classification and NLP (natural language processing).

The Deep Learning A-Z™: Hands-On Artificial Neural Networks allows students to learn how to create Deep Learning Algorithms in Python from two Machine Learning & Data Science experts. Templates included, which is very important. Essentially, you can use and modify the templates to suit your individual task at hand.

Complete Guide to TensorFlow for Deep Learning with Python is a course for learn how to use Google’s Deep Learning Framework – TensorFlow with Python! Solve problems with cutting edge techniques! TensorFlow is one of the more popular deep learning framework, and is slightly ahead in popularity compared to its closest rival, PyTorch.

Udemy course benefits

The first benefit of Udemy courses, is that you get to learn content from the top trainers. Often, these courses are superior to free YouTube content, and may be even better than the courses in your school.

The second benefit is that Udemy provides a certificate upon completion that you can list in your CV, as well as put in your LinkedIn profile. This is especially important if you are trying to transition into a data scientist job from another field, like engineering or physical sciences.

What is your favorite Udemy course for AI/ML/DL? Feel free to comment below!

Python (Anaconda) does not work with MacOS Catalina!

This is just to highlight that the Anaconda Python Distribution does not work with the latest MacOS Catalina. I only realized upon trying to open Anaconda Navigator, after installing Catalina.

The only (good) solution seems to be reinstalling Anaconda.

Source: https://www.anaconda.com/how-to-restore-anaconda-after-macos-catalina-update/

MacOS Catalina was released on October 7, 2019, and has been causing quite a stir for Anaconda users. Apple has decided that Anaconda’s default install location in the root folder is not allowed. It moves that folder into a folder on your desktop called “Relocated Items,” in the Security folder. If you’ve used the .pkg installer for Anaconda, this probably broke your Anaconda installation. Many users discuss the breakage at https://github.com/ContinuumIO/anaconda-issues/issues/10998.

Best Pattern Recognition and Machine Learning Book (Bishop)

Pattern Recognition and Machine Learning (Information Science and Statistics)

The above book by Christopher M. Bishop is widely regarded as one of the most comprehensive books on Machine Learning. At over 700 pages, it has coverage of most machine learning and pattern recognition topics.

It is considered very rigorous for a machine learning (data science) book, but yet has a lighter touch than a pure mathematics or theoretical computer science book. Hence, it is perfect as a reference book or even textbook for students self learning the subject from the ground up (i.e. students who want to understand instead of just blindly apply algorithms).

A brief overview of the contents covered (taken from the contents page of the book):

Introduction
Probability Distributions
Linear Models for Regression
Linear Models for Classification
Neural Networks
Kernel Methods
Sparse Kernel Machines
Graphical Models
Mixture Models and EM
Approximate Inference
Sampling Methods
Continuous Latent Variables
Sequential Data
Combining Models

2 types of chi-squared test

Most people have heard of chi-squared test, but not many know that there are (at least) two types of chi-squared tests.

The two most common chi-squared tests are:

1-way classification: Goodness-of-fit test
2-way classification: Contingency test

The goodness-of-fit chi-squared test is to test proportions, or to be precise, to test if an an observed distribution fits an expected distribution.

The contingency test (the more classical type of chi-squared test) is to test the independence or relatedness of two random variables.

The best website I found regarding how to practically code (in R) for the two chi-squared tests is: https://web.stanford.edu/class/psych252/cheatsheets/chisquare.html

I created a PDF copy of the above site, in case it becomes unavailable in the future:

Chi-squared Stanford PDF

Best Videos on each type of Chi-squared test

Goodness of fit Chi-squared test video by Khan Academy:

Contingency table chi-square test:

Popular packages in R and Python for Data Science

Most of the time, users of R and Python will rely on packages and libraries as far as possible, in order to avoid “reinventing the wheel”. Packages that are established are also often superior and preferred, due to lower chance of errors and bugs.

We list down the most popular and useful packages in R and Python for data science, statistics, and machine learning.

Packages in R

arules
arulesViz
car
caret
cluster
corrplot
ggplot2
lattice
perturb
psych
readr
recommenderlab
reshape2
ROCR
rpart
rpart.plot
tidyverse

Python Packages

factor_analyzer
math
matplotlib
numpy
pandas
scipy
seaborn
sklearn
statsmodels

pip install keeps installing old/outdated packages

This article is suitable for solving the following few problems:

module ‘sklearn.tree’ has no attribute ‘plot_tree’
pip install (on Spyder, Anaconda Prompt, etc.) does not install the latest package.

The leading reason for “module ‘sklearn.tree’ has no attribute ‘plot_tree” is because the sklearn package is outdated.

Sometimes “pip install scikit-learn” simply does not update the sklearn package to the latest version. Type “print(sklearn.__version__)” to get the version of sklearn on your machine, it should be at least 0.21.

The solution is to force pip to install the latest package:

pip install --no-cache-dir --upgrade <package>

In this case, we would replace <package> by “scikit-learn”.

Sometimes, pip install does not work in the Spyder IPython console, it displays an error to the effect that you should install “outside the IPython console”. This is not normal (i.e. it should not happen), but as a quick fix you can try “pip install” in Anaconda Prompt instead. It is likely that something wrong went on during the installation of Anaconda, Python, and the long-term solution is to reinstall Anaconda.

How to save sklearn tree plot as file (Vector Graphics)

The Scikit-Learn (sklearn) Python package has a nice function sklearn.tree.plot_tree to plot (decision) trees. The documentation is found here.

However, the default plot just by using the command

tree.plot_tree(clf)

could be low resolution if you try to save it from a IDE like Spyder.

The solution is to first import matplotlib.pyplot:

import matplotlib.pyplot as plt

Then, the following code will allow you to save the sklearn tree as .eps (or you could change the format accordingly):

plt.figure()
tree.plot_tree(clf,filled=True)  
plt.savefig('tree.eps',format='eps',bbox_inches = "tight")

To elaborate, clf is your Decision Tree classifier (to be defined before plotting the tree):

# Example from https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
clf = tree.DecisionTreeClassifier(random_state=0)
clf = clf.fit(iris.data, iris.target)

The outcome is a Vector Graphics format (.eps) tree that will retain its full resolution when zoomed in. The bbox_inches=”tight” command prevents truncating of the image. Without that command, sometimes the sklearn tree will just be cropped off and be incomplete.

Making big data a little smaller

While this result is nice, it also seems to mean that theoretically, we have already reached the limit in dimensional reduction for data compression.

Source: Science Daily

Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed up algorithms

Date:: October 19, 2017
Source:: Harvard John A. Paulson School of Engineering and Applied Sciences
Summary:: Computer scientists have found that the Johnson-Lindenstrauss lemma, a 30-year-old theorem, is the best approach to pre-process large data into a manageably low dimension for algorithmic processing.

When we think about digital information, we often think about size. A daily email newsletter, for example, may be 75 to 100 kilobytes in size. But data also has dimensions, based on the numbers of variables in a piece of data. An email, for example, can be viewed as a high-dimensional vector where there’s one coordinate for each word in the dictionary and the value in that coordinate is the number of times that word is used in the email. So, a 75 Kb email that is 1,000 words long would result in a vector in the millions.

This geometric view on data is useful in some applications, such as learning spam classifiers, but, the more dimensions, the longer it can take for an algorithm to run, and the more memory the algorithm uses.

As data processing got more and more complex in the mid-to-late 1990s, computer scientists turned to pure mathematics to help speed up the algorithmic processing of data. In particular, researchers found a solution in a theorem proved in the 1980s by mathematics William B. Johnson and Joram Lindenstrauss working the area of functional analysis.

Known as the Johnson-Lindenstrauss lemma (JL lemma), computer scientists have used the theorem to reduce the dimensionality of data and help speed up all types of algorithms across many different fields, from streaming and search algorithms, to fast approximation algorithms for statistical and linear algebra and even algorithms for computational biology.

Source:

Harvard John A. Paulson School of Engineering and Applied Sciences. “Making big data a little smaller: Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed up algorithms.” ScienceDaily. ScienceDaily, 19 October 2017. <www.sciencedaily.com/releases/2017/10/171019101026.htm>.