Pattern Recognition and Machine Learning (Information Science and Statistics)
The above book by Christopher M. Bishop is widely regarded as one of the most comprehensive books on Machine Learning. At over 700 pages, it has coverage of most machine learning and pattern recognition topics.
It is considered very rigorous for a machine learning (data science) book, but yet has a lighter touch than a pure mathematics or theoretical computer science book. Hence, it is perfect as a reference book or even textbook for students self learning the subject from the ground up (i.e. students who want to understand instead of just blindly apply algorithms).
A brief overview of the contents covered (taken from the contents page of the book):

Introduction

Probability Distributions

Linear Models for Regression

Linear Models for Classification

Neural Networks

Kernel Methods

Sparse Kernel Machines

Graphical Models

Mixture Models and EM

Approximate Inference

Sampling Methods

Continuous Latent Variables

Sequential Data

Combining Models