Excel is a powerful tool that has some amazing functions and functionalities when working with statistics.
Finding a correlation between two data series is one of the most common statistical calculation when working with large datasets,
I was working as a financial analyst a few years ago, and although we were not heavily involved in statistical data, finding correlation was something we still had to do quite often.
In this tutorial, I will show you two really easy ways to calculate correlation coefficient in Excel. There is already a built-in function to do this, and you can also use the Data Analysis Toolpak.
So let’s get started!
This Tutorial Covers
What is the Correlation Coefficient?
Since this is not a statistics class, let me briefly explain what is the correlation coefficient, and then we’ll move on to the section where we calculate the correlation coefficient in Excel.
A correlation coefficient is a value that tells you how closely two data series are related.
A commonly used example is the weight and height of 10 people in a group. If we calculate the correlation coefficient for the height and weight data for these people, we will get a value between -1 and 1.
A value less than zero indicates a negative correlation, which means that if the height increases then the weight decreases, or if the weight increases at then the height decrease.
And a value more than zero indicates a positive correlation, which means that if the height increases then the weight increases, and if the height decreases then the weight decreases.
The closer the value is to 1, the stronger is the positive correlation. So a value of .8 would indicate that the height and weight data are strongly correlated.
Now, let’s see how to calculate this correlation coefficient in Excel.
Calculating Correlation Coefficient in Excel
As I mentioned, there are a couple of ways you can calculate the correlation coefficient in Excel.
Using CORREL Formula
CORREL is a statistics function that was introduced in Excel 2007.
Suppose you have a data set as shown below where you want to calculate the correlation coefficient between the height and the weight of 10 people.
Below is the formula that would do this:
The above CORREL function takes two arguments – the series with the height data points and the series with the weight data points.
And that’s it!
As soon as you hit enter, Excel does all the calculations in the back-end it gives you one single Pearson correlation coefficient number.
In our example, that value is a little over .5, which indicates that there is a fairly strong positive correlation.
This method is best used if you have two series and all you want is the correlation coefficient.
But if you have multiple series and you want to find out the correlation coefficient of all these series, then you can also consider using the data analysis tool pack in Excel (covered next)
Using the Data Analysis Toolpak
Excel has a Data Analysis Toolpak that can be used to quickly calculate various statistics values (including getting the correlation coefficient).
But the Data Analysis Toolpak is disabled by default in Excel. So the first step would be to enable the data analysis tool back and then use that to calculate the Pearson correlation coefficient in Excel.
Enabling the Data Analysis Toolpak
Below are the steps to enable the Data Analysis Toolpak in Excel:
- Click the File tab
- Click on Options
- In the Excel Options dialog box that opens up, click on the Add-ins option in the sidebar pane
- In the Manage drop-down, select Excel add-ins
- Click on Go. This will open the add-ins dialog box
- Check the Analysis Toolpak option
- Click on Ok
The above steps would add a new group in the Data tab in the Excel ribbon called Analysis. Within this group, you would have the Data Analysis option
Calculating the Correlation Coefficient Using Data Analysis Toolpak
Now that you have the analysis tool back available in the ribbon, let’s see how to calculate the correlation coefficient using it.
Suppose you have a data set as shown below and you want to find out the correlation between the three series (height and weight, height and income, and weight and income)
Below are the steps to do this:
- Click the Data tab
- In the Analysis group, click on the Data Analysis option
- In the Data Analysis dialog box that opens up, click on ‘Correlation’
- Click OK. This will open the Correlation dialog box
- For input range, select the three series – including the headers
- For ‘Grouped by’, make sure ‘Columns’ is selected
- Select the option – ‘Label in First Row’. This will make sure that in the resulting data would have the same headers and it would be a lot easier to understand the results
- In the Output options, choose where you want the resulting table. I’m going to go with cell G1 on the same worksheet. You can also choose to get your results in a new worksheet or a new workbook
- Click OK
As soon as you do this, Excel would calculate the correlation coefficient for all the series and give you a table as shown below:
Note that the resulting table is static, and would not update in case any of the data points in your table change. In case of any change, you will have to repeat the above steps again to generate a new table of correlation coefficients.
So these are two quick and easy methods to calculate correlation coefficient in Excel.
I hope you found this tutorial useful!
Other Excel tutorials you may also like:
- How to Calculate Standard Deviation In Excel (Step-by-Step)
- How to Find Outliers in Excel (and how to handle these)
- Calculating Moving Average in Excel [Simple, Weighted, & Exponential]
- How to Calculate Compound Annual Growth Rate (CAGR) in Excel
- One Variable Data Table in Excel
- Two Variable Data Table in Excel
- Scenario Manager in Excel
- Using Solver in Excel
- How to Get Descriptive Statistics in Excel?
- Calculate the Coefficient of Variation (CV) in Excel