SAS for finance

SAS for Finance Forecasting and data analysis techniques with real-world examples to build powerful financial models Harish Gulati BIRMINGHAM - MUMBAI SAS for Finance Copyright © 2018 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information Commissioning Editor: Amey Varangaonkar Acquisition Editor: Divya Poojari Content Development Editor: Amrita Noronha Technical Editor: Nilesh Sawakhande Copy Editor: Safis Editing Project Coordinator: Shweta H Birwatkar Proofreader: Safis Editing Indexer: Aishwarya Gangawane Graphics: Jisha Chirayil Production Coordinator: Shantanu Zagade First published: May 2018 Production reference: 1250518 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78862-456-5 www.packtpub.com mapt.io Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career For more information, please visit our website Why subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Mapt is fully searchable Copy and paste, print, and bookmark content PacktPub.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks Contributors About the author Harish Gulati is a consultant, analyst, modeler, and trainer based in London He has 15 years' financial, consulting, and project management experience with leading banks, management consultancies, and media hubs He enjoys demystifying his complex line of work in his spare time This has led to him being an author and orator at analytical forums He has also co-authored Role of a Data Analyst, published by the British Chartered Institute of IT (BCS) He has an MBA in brand communications and a degree in psychology and statistics About the reviewer Rashmi Gupta is an entrepreneur and consultant for established media and financial brands in the field of marketing and digital analytics She is currently the director of Agile Fintech Partners Artificial intelligence is a subject area that interests her, and she is currently building her expertise in the area Packt is searching for authors like you If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea Table of Contents Preface Chapter 1: Time Series Modeling in the Financial Industry Time series illustration The importance of time series Forecasting across industries Characteristics of time series data Seasonality Trend Outliers and rare events Disruptions Challenges in data Influencer variables Definition changes Granularity required Legacy issues System differences Source constraints Vendor changes Archiving policy Good versus bad forecasts Use of time series in the financial industry Predicting stock prices and making portfolio decisions Adhering to Basel norms Demand planning Inflation forecasting Managing customer journeys and maintaining loyalty Summary References Chapter 2: Forecasting Stock Prices and Portfolio Decisions using Time Series Portfolio forecasting A portfolio demands decisions Forecasting process Visualization of time series data Business case study Data collection and transformation Model selection and fitting 10 10 11 11 11 13 14 14 14 15 15 15 15 16 16 18 19 19 20 20 20 21 21 22 23 23 27 29 30 31 37 Table of Contents Part A – Fit statistics Part B - Diagnostic plots Part C - Residual plots Dealing with multicollinearity Role of autocorrelation Scoring based on PROC REG ARIMA Validation of models Model implementation Recap of key terms Summary Chapter 3: Credit Risk Management Risk types Basel norms Credit risk key metrics Exposure at default Probability of default Loss given default Expected loss Aspects of credit risk management Basel and regulatory authority guidelines Governance Validation Data PD model build Genmod procedure Proc logistic Proc Genmod probit Summary Chapter 4: Budget and Demand Forecasting The need for the Markov model Business problem Markovian model approach ARIMA model approach Markov method for imputation Summary Chapter 5: Inflation Forecasting for Financial Planning What is inflation? Reasons for inflation Inflation outcome and the Philips curve Winners and losers Business case for forecasting inflation [ ii ] 42 46 56 57 59 60 61 75 76 78 79 80 81 81 83 83 84 84 85 86 86 86 87 87 87 90 101 103 107 108 109 111 112 119 138 149 150 151 153 156 157 158 Table of Contents Data-gathering exercise Modeling methodology Multivariate regression model Forward selection model Backward selection Maximize R Univariate model Summary Chapter 6: Managing Customer Loyalty Using Time Series Data Advantages of survival modeling Key aspects of survival analysis Data structure Business problem Data preparation and exploration Non-parametric procedure analysis Survival curve for groups Survival curve and covariates Parametric procedure analysis Semi-parametric procedure analysis Summary Chapter 7: Transforming Time Series – Market Basket and Clustering Market basket analysis Segmentation and clustering MBA business problem Data preparation for MBA Assumptions for MBA Analysis of a set size of two A segmentation business problem Segmentation overview Clustering methodologies Segmentation suitability in the current scenario Segmentation modeling Summary Other Books You May Enjoy Index 158 166 166 168 176 180 185 191 192 195 198 199 200 201 207 209 212 216 232 239 241 241 243 243 244 245 246 258 260 262 264 266 283 284 287 [ iii ] Preface SAS is the world's largest privately held software business that offers an integrated suite of software solutions to manage data, produce reports, and build statistical models Who this book is for The book introduces statistical models in the finance industry in a simplified manner It has real-world examples supported by data and code that reproduces the models The chapters explain the relevance of the models to business problems, and the discussions about the diagnostics explains how the models can be implemented The book uses various graphical illustrations, rather than having a focus on equations, to help the reader understand complex models The book is designed to be a quick introduction to various modeling techniques by explaining their key concepts The intended reader is someone aspiring to work in the financial industry, or one of the many financial industry professionals who want to explore its various facets The reader could also be a student curious to know how theoretical knowledge is applied in the industry, or a finance professional who wants to up-skill and move on to another role The book's audience may also include any individual who works as a data analyst, data scientist, data architect, data engineer, analytics and insights professional, business analyst, or someone who integrates the outputs of models in business strategy but isn't aware of how problems are solved What this book covers Chapter 1, Time Series Modeling in the Financial Industry, introduces time series modeling, and discusses its importance, the characteristics and challenges of data, and explains its use in the financial industry The chapter also discusses the way forecasting is used across industries and what is meant by a good or bad forecast Transforming Time Series – Market Basket and Clustering Chapter Unfortunately, the SAS University Edition on which the modeling has been done for the book doesn't support some visual features that help to effectively showcase the output of Proc tree However, the full edition of SAS software produces a more visually appealing chart of the procedure For our current consumption, only the tabular output has been shared in Figure 7.17 The procedure is going to help us to assign the cluster number to each of our customer IDs This is an important step, as after this we can go ahead and try to understand the profiling of our cluster constituents This is a Proc tree five-cluster specification: Proc tree data = tree out = cluster_output nclusters=5; Id custid; Copy age aum risk_appetite fund_performance investment_potential investment_involvement complex_product; Run; proc print data=cluster_output(drop=clusname); run; Figure 7.17: Customer cluster allocation [ 276 ] Transforming Time Series – Market Basket and Clustering Chapter Although there was a fair bit of confidence in going with the model generated up to now, there were a lot of other alternate models considered in the build phase The models differed in the type of clustering used, the data standardization used, and the mix of variables used to build the models Let's produce an alternative model using the following code: /*Age and AUM have been dropped in the model*/ Proc varclus data=cluster_model; Var risk_appetite fund_performance investment_potential investment_involvement complex_product; run; The output in Figure 7.18 shows that the model did not produce any splitting and suggested a single-cluster solution This happened after two key variables were dropped from the modeling solution: [ 277 ] Transforming Time Series – Market Basket and Clustering Chapter The chart reconfirms that we are left with a single cluster solution after the omission of two variables from the alternate model code: Figure 7.18: Alternate modeling attempt Prof Cox gave the five clusters a name and recommended a strategic direction for Vogue to take regarding the clusters As a modeler, the business expects that the model built is statistically robust It can be validated and is documented with a high degree of governance during the build, approval, and implementation phases However, modelers at times tend to forget how simple summaries of the model can help to create a greater understanding of the model and help trust the insights generated The next few pages are dedicated to showcasing how Prof Cox summarized the output from the model: Figure 7.19: Segment profile as defined by the modeler [ 278 ] Transforming Time Series – Market Basket and Clustering Chapter She further described the segments in Figure 7.19 as: Star performers: These customers are the ideal age group Probably the happiest Vogue customers in terms of returns on investment They have a high level of AUM and there is a potential for further investment They are medium risk takers and have no preference regarding simple or complex products They are the smallest cluster of the five Cash cows: They are split across the age groups They behave like cash cows for Vogue as they have low-to-medium AUM, yet are large in number and help to sustain the business They are involved in their decisions and aren't risk takers 76% of them have experienced low returns with Vogue However, due to their low risk-taking preferences, and in general, lack of inertia to move to competitors, they are thought to be good for the business in the long run 18% of them have experienced medium returns and this percentage could be further increased in the years to come Nurture: They are split across age the groups and have a medium-to-highpotential to invest They have experienced a mixed level of fund performance Their risk appetite is quite high and they don't tend to have complex products The most promising feature of this segment is that they have low-to-mid-level AUM Given their potential to invest, with a bit more focus this segment could become a more exciting segment for Vogue After all, this is their second biggest segment Keen but not there: A fairly big cluster of younger individuals with lower involvement, low fund performance, low potential to invest, and low-to-midAUM These are high risk takers who tend to have complex products They could also benefit from being nurtured, but they are probably not mature enough from a prospective client perspective This is a segment that Vogue should continue to watch out for and be patient with Going nowhere: This segment has young members They have low involvement in investing and lower potential The lower potential might stem from the fact that they already tend to have high AUM with Vogue However, the predominantly low fund performance that they might be facing could be a factor in them expressing lower investment potential Vogue needs to look at this segment and see if relationship managers need to focus less on these customers [ 279 ] Transforming Time Series – Market Basket and Clustering Chapter So, how did Prof Cox define and describe the segments? The output dataset produced as part of the proc tree code in Figure 7.17 is where the cluster name is stored against each customer However, the cluster name is a number and isn't a descriptive field describing the characteristics of the cluster/segment Prof Cox had to produce some profiling tables from the output dataset Let's look at them: Age Cluster No of Customers Young Mid 14 22 36 36 30 18 42 35 10 Senior 24 24 AUM Cluster No of customers Low Med 25 59 30 33 28 20 Good 12 High 31 35 10 Risk appetite Cluster No of customers Low 70 13 13 Med 32 25 11 24 [ 280 ] High 53 37 Transforming Time Series – Market Basket and Clustering Chapter Fund performance Cluster No of customers Low 73 20 39 33 Med 18 13 12 High 29 39 Investment potential Cluster No of customers Low 84 27 42 Med 12 30 21 Investment involvement Cluster % of customers 13% 32% 24% 16% 15% [ 281 ] High 35 40 Transforming Time Series – Market Basket and Clustering Chapter Complex Product No of customers No 19 22 55 Cluster Cluster Yes 19 74 17 42 38 % of customers 13% 32% 24% 16% 15% Cluster summary Cluster Age Young to medium Split across Split across Young Young AUM High Low to medium Low to medium Low to medium High Risk appetite Medium Low High High Split Across Fund performance High Low Split across Low Low Cluster summary continued: Cluster Investment potential High Low Medium to high Low to medium Low Investment involvement Medium to high High Medium to high Low Low [ 282 ] Complex product No preference Yes No Yes Yes Transforming Time Series – Market Basket and Clustering Cluster Chapter Segment Star performers Cash cows Nurture Keen but not there Going nowhere Figure 7.20: Segment allocation details As you can see in Figure 7.20, we need to develop some profiling tables to understand the characteristics of our segment constituents We can already see that the clusters differ between the mix of age groups, assets under management, and risk appetite By adding profiling info using other variables for clustering, we can come up with descriptions of the segments What Prof Cox has done is to go a step beyond and name the segments Naming the segments is an easy way to remember the key characteristics of the segments Some businesses prefer to name segments in a way that explains what each segment means to their strategy There is no specific scientific way to describe and name a segment The insights from profiling should make business sense, and the naming of segments should lead to the achievement of some business goals Having named her segments and being ready with her proposed strategy for the segments, Prof Cox looked forward to sharing the modeling results with Vogue Summary We have looked at two analysis methodologies in this chapter MBA analysis and segmentation Both use datasets related to time series, but they transform the data and don't consider the time element explicitly as part of the analysis In the MBA analysis, we saw that sequence was a sort of proxy for time series We generated insights by focusing on business problems that were the subject of modeling MBA lacks the statistical depth and rigor that clustering has However, neither are strictly statistically-driven analysis scenarios With MBA, we showcased how it makes intuitive sense to evaluate the association between the products and services offered by a bank and leverage the information In segmentation via clustering, we showcased how the number of clusters generated differed from the preferred statistical information available to us from two different methodologies Yet, we decided to go for an approach that made business sense and could be supported by showcasing the analysis conducted [ 283 ] Other Books You May Enjoy If you enjoyed this book, you may be interested in these other books by Packt: Big Data Analytics with SAS David Pope ISBN: 978-1-78829-090-6 Configure a free version of SAS in order hands-on exercises dealing with data management, analysis, and reporting Understand the basic concepts of the SAS language which consists of the data step (for data preparation) and procedures (or PROCs) for analysis Make use of the web browser based SAS Studio and iPython Jupyter Notebook interfaces for coding in the SAS, DS2, and FedSQL programming languages Understand how the DS2 programming language plays an important role in Big Data preparation and analysis using SAS Integrate and work efficiently with Big Data platforms like Hadoop, SAP HANA, and cloud foundry based systems Other Books You May Enjoy IBM SPSS Modeler Essentials Jesus Salcedo, Keith McCormick ISBN: 978-1-78829-111-8 Understand the basics of data mining and familiarize yourself with Modeler’s visual programming interface Import data into Modeler and learn how to properly declare metadata Obtain summary statistics and audit the quality of your data Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields Assess simple relationships using various statistical and graphing techniques Get an overview of the different types of models available in Modeler Build a decision tree model and assess its results Score new data and export predictions [ 285 ] Other Books You May Enjoy Leave a review - let other readers know what you think Please share your thoughts on this book with others by leaving a review on the site that you bought it from If you purchased the book from Amazon, please leave us an honest review on this book's Amazon page This is vital so that other potential readers can see and use your unbiased opinion to make purchasing decisions, we can understand what our customers think about our products, and our authors can see your feedback on the title that they have worked with Packt to create It will only take a few minutes of your time, but is valuable to other potential customers, our authors, and Packt Thank you! [ 286 ] Index A aggregate demand (AD) 153 aggregate supply (AS) 153 Akaike's information criterion (AIC) 71 analysis of variance (ANOVA) 43 anti-money laundering (AML) area under the curve (AUC) 100 ARIMA model approach 119, 122, 124, 126, 127, 130, 131, 134, 138 artificial intelligence (AI) assets under management (AUM) 200, 258 autocorrelation 59 autoregressive (AR) 120 Autoregressive Integrated Moving Average (ARIMA) 60, 61, 64, 67, 70, 71, 73, 166 B backward selection model 176, 179 bad forecast versus good forecast 16 Basel accords about 81 capital requisites 83 market discipline 83 supervisory review 83 Basel Committee on Bank Supervision (BCBS) 19, 82 business as usual (BAU) 18 business issues about 111, 200 covariate 212, 215, 216 data exploration 201, 204, 207 data preparation 201, 204, 207 non-parametric procedure analysis 207 parametric procedure analysis 216, 220, 223, 226, 228, 230, 232 semi-parametric procedure analysis 232, 235, 237, 239 survival curve 212, 215, 216 survival curve, for groups 209 C Chief Financial Officer (CFO) 259 clustering about 243 methodologies 262, 264 competitor basket index 32 consumer price inflation (CPI) 151 cost-pull inflation causes 155 credit risk management about 86 Basel authority guideline 86 data 87 governance 86 regulatory authority guidelines 86 validation 87 credit risk about 80 exposure 83 key metrics 83 loss given default (LGD) 84 probability of default (PD) 84 cumulative distribution (cdf) 205 customer relationship management (CRM) 15 D data challenges 13 definition 14 granularity 14 influencer variables 14 legacy issues 15 policy, archiving 16 source constraints 15 system differences 15 vendor 15 demand-pull inflation causes 154 dependent 78 differencing 78 I E Earnings per share (EPS) 32 European Central Bank (ECB) 86 Expectation maximization (EM) 141 expected loss (EL) 85 exposure at default (EAD) 83 F factors, forecast quality assessing period 18 consistency 17 error margins 17 judgement forecast, versus modeled forecast 17 rare events 17 subject area 16 volatile environment 18 Federal Reserve System (FED) 152 Financial Conduct Authority (FCA) 202 forecasting process 27 forecasting using, in industries forward selection model 168, 172, 175 G global market share 32 good forecast versus bad forecast 16 Gross Domestic Product (GDP) 6, 32 H hazard rate chart 194 independent 78 inflation outcome winners and losers 157 inflation, countries 32 inflation about 151 causes 153 data gathering 158, 163, 166 forecasting, business case 158 outcome 156 inverse autocorrelation function (IACF) 68 investment strategy considerations benefit type 25 diversification 25 exit strategy 26 investment product 24 taxation 26 K know your customer (KYC) L left-hand side (LHS) 245 link 87 loan to values (LTVs) 87 loss given default (LGD) about 84 expected loss (EL) 85 M M1 money supply 32 machine learning (ML) market basket analysis (MBA) about 241 assumptions 245 business problem 243 data preparation 244 Markov Chain Monte Carlo (MCMC) method 139 Markov model need for 109, 111 using, for imputation 138, 142, 143, 145, 146, 149 [ 288 ] Markovian model approach 112, 114, 118, 119 maximize R model 180, 182, 185 media analytics index 33 model building about 87, 89 Genmod procedure 90, 92, 94, 95, 97, 99 Proc Genmod probit 103, 105 Proc logistic, using 101 model implementation 76 modeling methodology about 166 maximize R model 180, 182, 185 multivariate regression model 166 univariate model 185, 188, 190 modeling variables competitor and market related 32 econometric variables 32 financial ratios and information 32 models, validation 75 moving average (MA) 120 multicollinearity about 78 dealing 57, 59 multivariate regression model about 166 backward selection model 176, 179 forward selection model 168, 172, 174 O Office for National Statistics (ONS) 151 P Personal Consumption Expenditures (PCE) 152 Philips curve 156 platykurtic distribution 204 point of sales (POS) Portfolio forecasting 23 portfolio decision making 23 Price (PE) ratio 32 probability density function (PDF) 217 probability of default (PD) 84 PROC REG ARIMA 61, 64, 67, 70 scoring 60 Prudential Regulatory Authority (PRA) 19, 86 R red, amber, and green (RAG) 100 regression 78 right censoring 197 right-hand side (RHS) 245 risk types about 81 liquidity risk 81 market risk 81 operational risk 81 reputation risk 81 risk-weighted assets (RWA) 82 S Schwarz's Bayesian criterion (SBC) 72 seasonality 78 segmentation about 243 business problem 258 modeling 266, 267, 270, 272, 274, 276, 278, 280 overview 260, 262 using 264 set size of two analyzing 246, 248, 250, 252, 255, 257 stationarity 78 stepwise regression 167 subject matter experts (SMEs) 28 survival analysis data structure 199 features 198 survival modeling advantages 195, 196, 198 T time series data business case study 30 characteristics 10 data collection 31, 36, 37 diagnostic data 55 diagnostic plots 46, 49, 51 [ 289 ] disruptions 11 events 11 fit statistics 42, 44, 45, 46 model fitting 37, 42 model selection 37, 42 outliers 11 residual plots 56 seasonality 10 transformation 31, 36, 37 trend 11 visualization 29, 30 time series illustration time series advantages Basel norms, adhering 19 customers journeys, managing 20 demand planning 20 inflation forecasting 20 loyalty, maintaining 20 portfolio decisions, creating 19 stock prices, predicting 19 using, in financial industry 18 trend 78 U univariate model 188, 190 Unobserved Components Model (UCM) 185 V variance inflation factor (VIF) 57 .. .SAS for Finance Forecasting and data analysis techniques with real-world examples to build powerful financial models Harish Gulati BIRMINGHAM - MUMBAI SAS for Finance Copyright... WinRAR/7-Zip for Windows Zipeg/iZip/UnRarX for Mac 7-Zip/PeaZip for Linux The code bundle for the book is also hosted on GitHub at https://github.com/ PacktPublishing/ SAS- for- Finance In case... and Demand Forecasting The need for the Markov model Business problem Markovian model approach ARIMA model approach Markov method for imputation Summary Chapter 5: Inflation Forecasting for Financial

SAS for finance

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Cover

Title Page

Copyright and Credits

Packt Upsell

Contributors

Table of Contents

Preface

Chapter 1: Time Series Modeling in the Financial Industry

Time series illustration

The importance of time series

Forecasting across industries

Characteristics of time series data

Seasonality

Trend

Outliers and rare events

Disruptions

Challenges in data

Influencer variables

Definition changes

Granularity required

Legacy issues

System differences

Source constraints

Tài liệu cùng người dùng

Tài liệu liên quan