TradeMark Study Guide iCAS CSPA Exam 3: Predictive Modeling Methods and Techniques
Minimum price
Suggested price

TradeMark Study Guide iCAS CSPA Exam 3: Predictive Modeling Methods and Techniques

Spring 2021 Exam Version 1

About the Book

This study guide equips you with the information you need to successfully pass iCAS CSPA Exam 3. It organizes and summarizes the syllabus materials to help guide you through your study process. It also contains exam questions for each topic and recommendations for supplemental study materials. Written by someone without experience in R and statistics prior to taking the exam, this guide is designed to help all exam takers regardless of their background.

About the Author


Tyson Mohr is an actuary with 10+ years of experience in the insurance industry. He has expertise in life insurance, capital modeling, model risk management, and Enterprise Risk Management. He is currently a manager of a data science team focusing on predictive modeling for P&C rating and underwriting. He has a passion for both learning and teaching.

Tyson lives in Bloomington, Illinois with his wife and 3 sons.

Table of Contents

  • Introduction
    • Biographical Note
    • Study Guide Structure
    • Study Tips
    • Strategies for Test Day
    • Feedback
  • A1: Types of Data, Missing and Incomplete Data
    • a. Describe types of data such as discrete and continuous data. Describe special issues that arise in data from surveys
    • b. Describe key patterns of missing data values, including censoring, truncation, missing-at- random, and missing-completely-at-random.
    • c. Describe key underlying causes of missing data. Identify appropriate ways to deal with missing values in a given situation, and identify the advantages and disadvantages of each.
  • A2: Linear Model Diagnostics
    • Extra: Thoughts on how to approach statistical content
    • a. Interpret linear model output such as confidence intervals for parameter estimates and for predictions. Perform, interpret, and act upon standard diagnostics on linear models, including assessment and treatment.
    • b. Understand and apply the hat matrix, hat values, residuals (raw, standardized, Studentized, and Pearson), and Cook’s D to detect outliers and influential observations
    • c. Apply residual plots, marginal model plots, and added variable plots to assess quality of fit and the impact of each predictor
    • d. Use QQ plots to diagnose non-normal errors
    • e. Use F-tests, residual plots, component-plus- residual plots, and CERES plots to identify non-linear dependencies
    • f. Use residual plots and spread-level plots to identify heteroscedasticity; determine when transformation of the target variable (possibly via Box Cox) is an appropriate remedy, and when weighted regression is appropriate.
    • g. Identify collinearity via variance-inflation factors and generalized variance-inflation factors and discuss possible ways to deal with collinearity
  • A3: Classical Models—Generalized Linear Models and Their Diagnostics
    • a. Understand the assumptions behind different forms of the Generalized Linear Model and be able to select the appropriate model
    • b. Understand the relationship between mean and variance for various models within the GLM family
    • c. Understand how to select the appropriate link function and distribution for the dependent variable.
    • d. Understand the Tweedie as compound gamma- Poisson and also as the GLM with variance function a powerlaw.
    • e. Be able to describe the reason for a double GLM and two ways in which a double GLM might be fit. Be able to describe similarities and differences between a double GLM and a weighted GLM
    • f. Use appropriate diagnostics to evaluate the fit of a GLM
    • g. Describe the effect of non- canonical link function
    • h. Define deviance and its relationship to a GLM
  • Supplemental Study 1
  • B1: Validation Holdout vs Cross-Validation and Tuning Parameters
    • a. Explain and contrast holdout and Cross- Validation approaches and the best use of each
    • b,c. For a given dataset and model, use cross- validation to estimate the accuracy of model predictions. Why might this estimate be inaccurate?
  • B2: Evaluation: Goodness of Fit Metrics, Bootstrapping, Bias-Variance Tradeoff, and Presentation of Results
    • a. Define and apply ROC curves, AUC, Lorenz curves, and Gini index
    • b. Estimate variance of model estimates.
    • c. Describe why your model may be biased.
    • d. Describe how to build a model to minimize the expected mean squared error.
    • e,f. What exhibits do you show for the holdout data? What presentation material do you prepare and show?
  • B3: Classification Models and Special Considerations
    • a. Describe and apply the ROC curve in evaluating a classification model
    • b. Define and describe the Bayes error
    • c. Apply linear regression, logistic regression, linear discriminant analysis, quadratic discriminant analysis, and nearest neighbors to fit classification models. Compare and contrast these methods as to when each might be preferable
    • d. Fit a logistic regression by penalized maximum likelihood, and describe when that should be preferred to maximum likelihood
    • e. Describe how unbalanced training datasets can influence classifiers and why that is a problem
    • f, g. Identify algorithmic solutions to using unbalanced training sets, including various undersampling, oversampling, and cost- sensitive learning approaches. Discuss the advantages and drawbacks of each.
  • B4: Shrinkage and Feature Selection Methods
    • a,b. Apply forward stepwise selection. Define “best subset” selection.
    • c. Define a shrinkage method and explain which penalty term corresponds to which method (ridge, lasso)
    • d,e. Use shrinkage methods (lasso and ridge) to improve linear model predictions. Select the tuning parameter for the penalty term. Comment on how this is done.
  • Supplemental Study 2
  • C1: Experimental Design
    • Extra: Introduction to Experiments
    • a. Understand randomized experimental design, including factorial design, randomized block design, and covariance design.
    • b. Understand the importance of assessing power in the design stage
    • c. Understand internal validity and construct validity.
    • d. Understand external validity is and what are the threats to it, and how it can be improved.
    • e. Understand the Intention-to-Treat principle and apply it in the context of a business experiment
    • f. Understand Simpson’s paradox and explain how it can be misleading
  • C2: Experimental Methods
    • a. Understand choice models
    • b. Given appropriate data, be able to fit a choice model
    • c, d. Understand the use of conjoint analysis with survey data. Given some survey data be able to apply a conjoint model.
    • e. Describe power and significance in A/B testing.
    • f-h. Summarize common mistakes and difficulties in A/B testing and techniques to address these. Describe practical considerations in planning an A/B test. Recognize alternative techniques to classical A/B testing.
  • C3: Causal Inference from Observational Data
    • a,b. Understand coarsened exact matching (CEM), propensity scoring, and model based methods for estimating causal effects, and explain the strengths and weaknesses of each. Discuss the process for using propensity scores and CEM to estimate causal effects.
    • c. Distinguish causal effects from predictions
    • d. Explain SATT (sample average treatment effect on the treated)
  • A4: Hierarchical Models, including Linear Mixed Models, and Buhlmann Credibility
    • d. Build hierarchical models via the linear mixed- effects approach
    • a, c. Describe sources of correlation in longitudinal data. Use appropriate plots to do EDA of longitudinal data
    • b. Describe REML and its rationale
    • e. Apply Buhlmann credibility theory and describe the connection to linear mixed effects models
  • B5 Non-Linear Effects and Additive Models
    • a. Be able to discuss several ways of capturing non-linear relationships in regressions and GLM models, including polynomials, step functions, splines, smoothing splines, and local regression
    • b. Be able to build general additive models (GAM)
  • B6 Single Trees
    • a-d. Build regression and classification trees. Use a tree to determine an estimate for an observation. Discuss reasons for pruning and methods to prune. Implement pruning
  • B7 Ensemble Methods, Random Forests, and Boosting
    • a,b. Be able to fit bagged tree models, boosted tree models, and random forests to data. Be able to use each to get estimates for a new observation. Discuss how each of these methods works, and what its pros and cons are
  • Supplemental Study 3
  • B8 Principal Components Analysis and Unsupervised Learning
    • a,c. Explain and apply principal components analysis. Describe and apply principal components analysis in the context of dimension reduction.
    • b. Differentiate between supervised and unsupervised learning tasks.
    • Extra: Overview of clustering
    • d,f. Describe the choices involved in using k-means and hierarchical clustering and the implications thereof. Summarize potential issues with using clustering and ways to mitigate them.
    • e. Interpret a dendrogram.
    • g. Cluster data using k-means and hierarchical clustering
  • B9 Application Specific Methods: Association Models
    • a,c. Understand the basics of an association model as used in a market basket analysis. Identify the types of patterns that can be detected using simplified association rules.
    • b,d. Interpret rules metrics including support, confidence, lift association model. Interpret and evaluate the support, confidence, and lift of a market basket analysis.
  • B10 Application Specific Methods: Fraud Detection
    • a. Apply Pridit and Random Forests to fraud detection problems
  • Appendix: Resources on Statistics Fundamentals
  • Appendix: Overview of R Techniques
    • Introduction
    • R Fundamentals
    • Most Important Processes
    • Packages and Functions
  • Appendix: Spring 2021 Sample Exam Answers
  • Change Log

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Do Well. Do Good.

Authors have earned$10,444,831writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF, EPUB and/or MOBI files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub