This book focuses on supervised or predictive modeling for text, using text data to make predictions about the world around us. Text data contains within it latent information that can be used for insight, understanding, and better decision-making, and predictive modeling with text can bring that information to light. Natural language that we as speakers and/or writers use must be dramatically transformed to a machine-readable, numeric representation to be ready for computation. This book explores typical text preprocessing steps from the ground up, and considers the effects of these steps on models. This book covers categories of supervised machine learning that can be used with text data from regularized linear models to deep learning models, including how to tune and evaluate models.
An HTML version of this text can be found at
The sources to create the book are available in the GitHub repository