Speaker 1: In this module we're going to be covering the use of regression to help us make specific business decisions in a marketing context. Now I know lots of you have covered regression in other courses and that's the main reason we're actually going to skip over things like assumptions and necessary conditions to run regression. Now regression can be used for lots of things including decision making, forecasting, and even interpreting the results of some experiments. And to get us started I want to focus on a specific example and for that example we'll be using the MinuteMaid orange juice data set. And this data set is a set of data from an orange juice producer and we're going to set up the scenario as following. You've been hired as a consultant for the MinuteMaid company, so MinuteMaid is a major producer of orange juice. Before going for an important meeting with senior management you have been asked to analyze the sales data for MinuteMaid orange juice for the southern California market. To assist in your deliberations some data have become available for one of your key accounts, the largest grocery chain in the particular market that we're interested in. The database was collected from weekly store scanner data that captures information such as sales, the number of cartons sold, the price of the sale, and other promotion information for each product. Management in this case is particularly interested in understanding how different pricing strategies affect sales. So before getting into regression we can take a look at the data set. So here we go and we see that we have each row of data represents a week, we have the number of sales of orange juice in general, we have the number of sales for MinuteMaid, we have the price of MinuteMaid, we have the price of a competitor Tropicana Premium, another competitor Tropicana Regular, and another competitor which is the store brand. We also have information, these are going to be zero one indicator variables, whether an advertisement was displayed for MinuteMaid, Tropicana Premium, Tropicana Regular, and the store brand on that given week. So for instance in this first row of data there were a thousand sales, 66 were for MinuteMaid, MinuteMaid charts $2.99, Tropicana Premium was $3.66, Tropicana Regular was $2.39, the store brand was $2.49, MinuteMaid did not advertise, Tropicana Premium did not advertise, Tropicana Regular did advertise, and the store brand also advertised. So one of the things we could do is take a look at the relationship between something like price and sales. So here I'm going to put up a graph where we have the week on the x-axis and on the left y-axis we have the number of sales of MinuteMaid orange juice, and the right axis we have the price. So kind of eyeballing this is actually pretty difficult to understand if there's such a relationship, so instead why don't we look at the graph in a slightly different way. And what I'll do is I'll plot a scatter plot such that this time price is on the x-axis and sales are on the y-axis. And so what we see is it looks like there's a pretty clear negative relationship, such that as price increases sales decrease. Of course the point of statistics is that we don't want to be eyeballing these types of relationships, we actually want to test for them. Now one way you could do this by the way is with just a simple correlation, which we learned not too long ago. But a correlation doesn't actually tell us much about the nature of the relationship, it merely tells us that a relationship exists. And so for that we need regression, and in particular we'll be looking at a linear sales model. So we want to explain the variation of sales as a function of price. In other words, prices fluctuate, sales fluctuate, and we want to understand if we can explain the fluctuation in sales as a function of the fluctuations in price. So we assume that sales and price are related in the following way. We say that sales at any given time t, which is s sub t, is equal to some constant, which we'll call beta zero, plus the price at that time period, p subscript t, times some other coefficient beta one, plus an error term. We're going to leave the error term alone for now. We now are assuming that sales in week t is a linear function of price, plus some randomness that we can't pick up, which is the epsilon, the error. And what we need to do is find beta zero and beta one. Because beta one actually is going to tell us the degree of relationship between price and sales. In other words, it'll tell us what happens as price changes to sales. So what we'll do is first look at how to do this in SPSS, and then I'll unpack the intuition. So in SPSS, this is pretty simple. We go to analyze, regression, linear. And what we want to do is say, what are we predicting? Well, we're predicting the sales of Minute Maid orange juice, so MM sales, as a function of the price, Minute Maid price. For the time being, we're going to leave all other options as is, and we're just going to ask for SPSS to run this. And so we get our few tables, and we want to focus on a couple of them. First of all, the model summary. The model summary simply tells us the degree to which we're able to explain the variation in sales as a function of price. So if we look at the r-squared, or in fact, we can look at the adjusted r-squared, which takes into account the number of variables that we've inputted into our model. What this tells us is that 40% of all the variation in sales can be explained by the variation in price. You could say this a different way. You could say that 60% of the variation in sales is unexplained by price. In other words, there's other factors that are driving the change in sales besides the change in price. Moving down to the coefficients, and actually I'll just take these and I'll transpose them so we can put some annotation on top. What we see is we can identify the beta zero and beta one from our equation. So if you recall, we said that sales is a function of some constant plus price times some coefficient plus some error. And what we have here is this is the beta zero, and this is the beta one. Over here we have the standard errors of these estimates, which are roughly described as the degree of uncertainty associated with these various estimates. And most critically, we have the t-statistics associated with the coefficients. So what this t-statistic tells us and the significance that goes with it is that this coefficient, minus 377, is in fact statistically different from zero. In other words, this coefficient predicts sales in some way. And the way that we can read this equation is if we wanted to know at any given time what sales were given a price, what we would do is we would say sales are a function of beta zero, which is 1092, plus beta one, which is negative 377, times whatever the price might be. And we don't know error, so we leave that alone. And so let's say for example that we have price of one dollar. We would say that we would expect sales in that given period to be 1092 plus negative 377 times one, which is equal to 715. And the other thing to note is that this coefficient is negative, which makes sense. What it means is that for every unit increase, for every one dollar increase in price, we expect a decrease in sales. And again, that's intuitively correct. So this is the core of regression. And what I'll do in the next video is try to give you some of the intuition as to what's actually going on under the hood.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now