Pattern Recognition and Machine Learning (Information Science and Statistics)
The above book by Christopher M. Bishop is widely regarded as one of the most comprehensive books on Machine Learning. At over 700 pages, it has coverage of most machine learning and pattern recognition topics.
It is considered very rigorous for a machine learning (data science) book, but yet has a lighter touch than a pure mathematics or theoretical computer science book. Hence, it is perfect as a reference book or even textbook for students self learning the subject from the ground up (i.e. students who want to understand instead of just blindly apply algorithms).
A brief overview of the contents covered (taken from the contents page of the book):
-
Introduction
-
Probability Distributions
-
Linear Models for Regression
-
Linear Models for Classification
-
Neural Networks
-
Kernel Methods
-
Sparse Kernel Machines
-
Graphical Models
-
Mixture Models and EM
-
Approximate Inference
-
Sampling Methods
-
Continuous Latent Variables
-
Sequential Data
-
Combining Models