Creating a Marketing Strategy with Python: Using Linear Regression for ROI Analysis

Convert Your Audio To Text

4.9/5

3726 customer reviews

Learn to develop a marketing strategy using Python and multiple linear regression to create an ROI table for management, tailored to different hospital sizes.

Python Project - Optimize Marketing Campaigns Regression and Correlation Analysis

Added on 09/30/2024

Speakers

Add new speaker

Speaker 1: Hi, I'm going to show you how to create a marketing strategy with Python where we're going to utilize Multiple linear regression to create an ROI table that we can pass on to management So what let me give you a little bit of background on some of the variables that you're looking at So we are dealing with three account types for a hospital We have a big hospital a medium hospital and a small house hospital There are normal ways in which sales are created, just organically, as you see from that top arrow. And then we have marketing tactics that we can utilize to drive sales, such as phone, email, flyers, and a direct sales visit. However, some of these tactics are going to work with some of the accounts better than others. Not all marketing tactics are equally effective. As you can see here, maybe a big hospital might benefit from using an email and phone strategy. Maybe a small hospital may be better off with a direct sales visit. And maybe a medium hospital is going to be better off just using flyers. So we're going to use Python to figure that out. So now that we're in Google Colab and we have our notebook open, we can see how we're going to tackle this problem. We're going to data load and check the quality. We're going to do some additions with features and a little bit of engineering. We're going to explore our data and do some statistical analysis. And then we're going to go with the final recommendations based on our analysis. we want to know what is the impact of each marketing strategy and sales visit on the total amount collected? And is the same strategy valid for all different client types? So what we're going to do is import the packages that we need, which are pandas for data manipulation, NumPy for some linear algebra and conditionals, Seaborn for our statistical visuals, and scipy.stats in order to do our regression. The next thing we're going to do is we're going to use Google Colab library to import files, and we're going to upload our file. And let me just show you how that works. Let me run both of these. Once you run that, you're going to be prompted to bring in the file that you're looking for. So I'm going to go to choose files. Once you isolate where that file is, you will have to upload it. First thing we need to do is, once we upload that file, we want to make sure that we can read it and see the column. So then you can do read CSV. And now you'll be able to find that file. And now we can look at the head of our data. In the head of our data, it will show us what is the structure, the variables, the columns and rows of the data that we have. So let's explore this. We have a client ID. We have the client type, which represents our account types, small, medium, and large facilities, which we saw in the presentation. Number of customers, the monthly target, zip code, The calendar, the account type, the unit sold, the campaign, email, flyer, phone, sales contact, sales contact two, sales contact three, four, and five. And then we have the number of competition. The amount collected represents a target variable that we are going to use to help to define our marketing strategy. We are going to use these variables here as our independent variables to help formulate that marketing strategy per client type. Now that we understand the data that we have, the structure of the columns, let's dive into some slight feature engineering where we are going to ensure that our calendar date column has the proper date by using the pandas to date time function to make sure and change that string into a date. Once we have that date we can use the DT object to pull out month and year which allows us to create additional columns. Now we are going to do some slight exploratory data analysis. We want to be able to explore and understand the basics of our data, the distribution across the different account types or client types and the difference of sales and account types. Also, we want to do some statistical analysis to answer the key questions, the impact of marketing on sales. We can use correlation, regression, or decision trees for that. We're going to look like, we're going to look at the impact of competition on sales. And then we want to, of course, understand how different types of clients can have different strategies, which is our key questions. So let's understand the distributions we have here. We can see that out of the client type we have, large, small, medium, and private, we can see that the lion's share of our customers are large facilities, then small facilities, and then medium facilities. We can do that with the value counts because we're just counting that particular category. We can continue with our categorical analysis by doing a crosstab where we're looking at two categories, which is client type and number of competition. Then we are able to look at the margins, and we are going to normalize that so we can look at the percentages. So we only have this high competition here is only 16% and 83% is low. Let's continue grouping these and taking the mean. If we group by the number of competition, we can take the mean across all our numerical data we can see that the amount collected for high is double the amount collected for low. We can also see that the units sold are double also. Although we are in the majority of our market is low competition most of our sales are coming from the high competition markets. We can further group by client type and take the mean, and you can see that even though medium facility is not the number one, not the number one most populated in our client resumes, we see that it is bringing in a lot more money than the other markets. We would definitely want to understand what is the marketing strategy around the medium facility and maybe that could be applied to other markets. So now we can dive into our correlation analysis using correlation and I just wanted to kind of show you. you, we can look at the correlation only pertaining to amount collected by using the correlation function and putting a count amount collected in a double brackets and we can see the correlation here to what we find important. Now we know correlation goes from negative one to positive one so we can look at the the correlating factors here and kind of weed out what we don't need. Of course, zip code is not going to be something that is functional, but we can also look at some of the correlating factors that are not important. And ultimately, this exercise is to understand our marketing variables. So let's dive into the actual correlation analysis. Now that we can dive into correlation, we can look at how correlation affects each one of our campaign variables when correlated with our account type. So in this section, we're importing Seaborn, saving as variable SNS. And then we're going to set a palette so that we can create the conditional formatting in the table below. So first we set our palette to green and we make sure that we use a CMAP is true. Then we're going to create our correlation analysis by one, isolating the columns that we deem are important which are our campaign variables and amount collected. We use the correlation function And then we isolate amount collected, and we reset the index. We then change the name of the columns just so we can have a better understanding of what we're doing by passing in the list with the columns method here. And we re-save that data frame. We re-save the data frame again. by eliminating amount collected because we don't need to see amount collected correlated with amount collected because we know that will be 100%. Once we do that, we sort our values by our direct of linear impact. And then we use ascending equals false so we can see that in descending order. And then we set the style background with the precision to two for our decimals. And that will give us the visual that you see here. And this visual shows us that Sales Contact 2 is highly correlated with the account collected and the campaign flyer, Sales Contact 3, and so on. However, this is not broken up by account type, which was our key question, because we want to be able to understand what we should use for each individual account type. So let's take a look at that correlation. It is the same structure with the exception that when we create that data frame, we are going to do a group by a client type, and then we're going to isolate those marketing variables. We're going to continue to utilize the correlation function, then only bring in account collected, reset our index, and then we're going to go through the same optimization that we did above. Now we can see small facility and what the correlation impact is for account collected, amount collected. And we can see that if we isolate one variable, which is sales contact two, we can see it's very different from each facility type. And we will continue to isolate those variables. For example, email is not very effective for large campaigns when we look at correlation. But let's continue to look at causation by moving away from correlation and building a linear regression model. So the way we're going to do that is bring in StatsModels and we're going to save that as a variable SM. We're going to use the formula from StatsModels so that we can write in our actual formula which would be amount collected is related to campaign email, campaign flyer, campaign phone, sales contact, sales contact 2, and so on. So what we're doing is using that formula, y equals mx plus b. So this is essentially a linear prediction problem where we're looking at y, mx plus b, which is the coefficient in our slope. And because we have multiple marketing variables, we're going to have multiple coefficients with our independent variables, which are represented by x. So we're looking at MX1, MX2, MX3, and then we'll get the intercept and so on. Now that formula obviously covers all our particular marketing variables in our OLS formula here. So we're just replicating what we see there. So after a little bit of cleaning where we've replaced some of the empty strings in our data columns, What we're gonna do is use that Stacks Models formula to use a linear regression model where we're trying to predict amount collected and then we are going to look at the different campaign metrics. We're going to make sure that we're sourcing the data that we've put together and then we're fitting that model. Then once we print the result summary, we can see all of our campaign tactics with the coefficient attached to them. And that's why we would have MX. And we can see that in this third column, which gives us what the p-value is, we would want a p-value less than .05 to be 95% confident. So we can see that, for example, across all accounts, we can see that the p-value for phone is quite high. phone is quite high. So what we can do is create a filter and eliminate that. The next step we want to be able to do is run this equation for each account type which we can do with a for loop. So we would cycle through the list of account types and we would run this printout with the filter that we only want variables that pass that significance test and what we get is a breakdown of of each one of our account types and the particular marketing tactic with the coefficient attached to it. So we can see for sales contact and sales contact 2 is very important for a small facility while sales contact 1 is important for our large facility. And the campaign flyer, which seems to work a little bit better for a medium facility. Then what we want to be able to do is create a data frame and not print this out. So we're going to run that for loop again, but in this case, we are just going to append each one of these results into a data frame using the append. And now you can see each one of those. If we present the summary, we can see it a little bit better. And finally, we have our data frame here that we can use, that we saw in the beginning of the presentation, that allows us to do some analysis. So we can see for private facility, for each dollar spent on our sales contact, we should Expect six dollars back large facility. We use sales contact one We should expect almost twelve dollars back which each dollar investment We can definitely do some some optimization here where we look at a heat map on each one but what I wanted to do here is just format these into dollars because I think this makes much more sense to our stakeholders, so I created a function where I apply the a dollar sign and only brought in two decimals and I applied that to our column which I renamed return on investment. And then what we end up here is a table for our marketing tactics. So that allows you to create a campaign. So for example, for our private facility, sales contact would be the one variable that we focus on. However, for a large facility, we can see we wanna put a little bit more effort into the sales contact because it has a higher level of success for large facilities. So when we create that campaign, we should reserve more money for this because we can see the return on investment is quite high for each dollar spent. And we can coordinate this to create a nice marketing campaign for our large facility that seems to work across different tactics. However, we can see that the money spent on phone is negative. For the small facility, we can see we are only getting 80 cents back per dollar spent, and also not losing anything on phone, but it's neutral. For our medium facility, Campaign Flyer works well and our Sales Contact 1, 2, and 3 also drive a return on investment. So we want to be able to tell our stakeholders where to invest more of their dollars. So from our analysis, we could see that the sales visit has a very significant effect on the amounts collected. However, we need to take in consideration that this particular marketing tactic may be further down in the sales cycle, which is making it more significant. For example, maybe the customer receives an email and then has some conversation over the phone, and then later the salesman is able to close the deal. So we need to consider that. Additionally, we need to consider that there may be a synergy between multiple tactics that we are not picking up in our model, so that may warrant further analysis. However, we can see from the current analysis which of our campaigns work together. For example, we can see that flyer was significant for big and medium hospitals, and sales visit was significant for each hospital, but however, the particular sales visit was different for each of our account types. Additionally, the phone for the small hospital didn't have any negative effect, which may indicate that it is intertwined with the sales visit and other campaigns. So this is just a quick and easy way to do an analysis of marketing tactics to see where you should invest more dollars on particular marketing tactics based on our regression analysis in Python. If you have any questions or concerns, please leave them in the comment. If you found this of any value, don't hesitate to like and share.