Logistic regression explained from scratch

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	35
Dung lượng	2,12 MB

Nội dung

Follow 574K Followers Editors Picks Features Deep Dives Grow Contribute About Logistic Regression Explained from Scratch (Visually, Mathematically and Programmatically) Hands on Vanilla Modelling Par.

Follow 574K Followers · Editors' Picks Features Deep Dives Grow Contribute About Logistic Regression Explained from Scratch (Visually, Mathematically and Programmatically) Hands-on Vanilla Modelling Part III Abhibhav Sharma 18 hours ago · 14 read Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD Image By Author A plethora of results appear on a small google search “Logistic Regression” Sometimes it gets very confusing for beginners in data science, to get around the main idea behind logistic regression And why wouldn't they be confused!!? Every different tutorial, article, or forum has a different narration on Logistic Regression (not including the legit verbose of textbooks because that would kill the entire purpose of these “quick sources” of mastery) Some sources claim it a “Classification algorithm” and Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD some more sophisticated ones call it a “Regressor”, however, the idea and utility remain unrevealed Remember that Logistic regression is the basic building block of artificial neural networks and no/fallacious understanding of it could make it really difficult to understand the advanced formalisms of data science Here, I will try to shed some light on and inside the Logistic Regression model and its formalisms in a very basic manner in order to give a sense of understanding to the readers (hopefully without confusing them) Now the simplicity offered here is at a cost of capering the in-depth details of some crucial aspects, and to get into the nitty-gritty of each aspect of Logistic regression would be like diving into the fractal (there will be no end to the discussion) However, for each such concept, I will provide eminent readings/sources that one should refer to For there are two major branches in the study of Logistic regression (i) Modelling and (ii) Post Modelling analysis (using the logistic regression results) While the latter is the measure of effect from the fitted coefficients, I believe that the black-box aspect of logistic regression has always been in its Modelling Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD My aim here is to: To elaborate Logistic regression in the most layman way To discuss the underlying mathematics of two popular optimizers that are employed in Logistic Regression (Gradient Descent and Newton Method) To create a logistic-regression module from scratch in R for each type of optimizer One last thing before we proceed, this entire article is designed by keeping the binary classification problem in mind in order to avoid complexity The Logistic Regression is NOT A CLASSIFIER Yes, it is not It is rather a regression model in the core of its heart I will depict what and why logistic regression while preserving its resonance with a linear regression model Assuming that my readers are somewhat aware of the basics of linear regression, it is easy to say that the linear regression Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD predicts a “value” of the targeted variable through a linear combination of the given features, while on the other hand, a Logistic regression predicts “probability value” through a linear combination of the given features plugged inside a logistic function (aka inverse-logit) given as eq(1): Equation Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD Logistic Function (Image by author) Hence the name logistic regression This logistic function is a simple strategy to map the linear combination “z”, lying in the (-inf,inf) range to the probability interval of [0,1] (in the context of logistic regression, this z will be called the log(odd) or logit or log(p/1-p)) (see the above plot) Consequently, Logistic regression is a type of regression where the range of mapping is confined to [0,1], unlike simple linear regression models where the domain and range could take any real value A small sample of the data (Image by author) Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD Consider simple data with one variable and its corresponding binary class either or The scatter plot of this data looks something like (Fig A left) We see that the data points are in the two extreme clusters Good, now for our prediction modeling, a naive regression line in this scenario will give a nonsense fit (red line in Fig A right) and what we actually require to fit is something like a squiggly line (or a curvy “S” shaped blue rule in Fig A right) to explain (or to correctly separate) a maximum number of data points (Image by author) Fig A Logistic regression is a scheme to search this most optimum blue squiggly line Now first let's understand what each point on this squiggly line Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD represents given any variable value projected on this line, this squiggly line tells the probability of falling in Class (say “p”) for that projected variable value So accordingly the line tells that all the bottom points that lie on this blue line have zero chances (p=0) of being in class and the top points that lie on it have the probability of 1(p=1) for the same Now, remember that I have mentioned that the logistic (aka inverse-logit) is a strategy to map infinitely stretching space (-inf, inf) to a probability space of [0,1], a logit function could transform the probability space of [0,1] to a space stretching to (-inf, inf) eq(2)&(Fig B) Equation Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD Fig B The logit function is given by log(p/1-p) that maps each probability value to the point on the number line {ℝ} stretching from -infinity to infinity (Image by author) Keeping this in mind, here comes the mantra of logistic regression modeling: Logistic Regression starts with first Ⓐ transforming the space of class probability[0,1] vs variable{ℝ} (as in fig A right) to the space of Logit{ℝ} vs variable{ℝ} where a “regression like” fitting is performed by adjusting the coefficient and slope in order to maximize the Likelihood (a very fancy stuff that I will elaborated this part in coming section) Ⓑ Once tweaking and tuning are done, the Logit{ℝ} vs variable{ℝ} space is remapped to class probability[0,1] vs variable{ℝ} using inverse-Logit (aka Logistic function) Ⓐ →Ⓑ →Ⓐ ) would eventually result in the Performing this cycle iteratively ( most optimum squiggly line or the most discriminating rule WOW!!! Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD Well, you may (should) ask (i) Why and how to this transformation ??, (ii) what the heck is Likelihood?? and (iii) How this scheme would lead to the most optimum squiggle?!! So for (i), the idea to the transformation from a confined probability space [0,1] to an infinitely stretching real space (-inf, inf) is because it will make the fitting problem very close to solving a linear regression, for which we have a lot of optimizers and techniques to fit the most optimum line The latter questions will be answered eventually Now coming back to our search for the best classifying blue squiggly line, the idea is to plot an initial linear regression line with the arbitrary coefficient on ⚠logit vs variable space⚠ coordinates first and then adjust the coefficients of this fit to maximize the likelihood (relax!! I will explain the “likelihood” when it is needed) In our one variable case, we can write equation 3: logit(p) = log(p/1-p) = β₀+ β₁*v ……………………………….(eq 3) Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD p

Ngày đăng: 09/09/2022, 19:48