Question:

Using the following to estimate the coefficient of determination in MATLAB:

```
load hospital
y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));
X2 = X(:,3);
mdl = fitlm(X2,y);
Estimated Coefficients:
Estimate SE tStat pValue
________ ________ ______ __________
(Intercept) 116.72 3.9389 29.633 1.0298e-50
x1 0.039357 0.025208 1.5613 0.12168
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 6.66
R-squared: 0.0243, Adjusted R-Squared 0.0143
F-statistic vs. constant model: 2.44, p-value = 0.122
```

If I'm only using one predictor variable in the linear model, why aren't the R<sup>2</sup> and adjusted-R<sup>2</sup> values the same. These should be interchangeable if there is only one predictor in the model. What am I missing here?

Answer1:Wikipedia gives two definitions for adjusted-R<sup>2</sup>:

<a href="https://i.stack.imgur.com/JNJVW.png" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/JNJVW.png" data-original="https://i.stack.imgur.com/JNJVW.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

and

<a href="https://i.stack.imgur.com/yt83Y.png" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/yt83Y.png" data-original="https://i.stack.imgur.com/yt83Y.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

I'm guessing that your assertion that R<sup>2</sup> should equal adjusted-R<sup>2</sup> is based on that first equation since when <em>p</em> is <em>1</em> the numerator on the second term is <em>0</em>. Now I couldn't find a reference for this (and disappointingly there are no citations in this section of the wiki article) but I'm fairly confident that the first equation is actually an approximation.

The second equation, which aside from the notation also matches equation 6.4 on page 212 of <a href="http://www-bcf.usc.edu/~gareth/ISL/" rel="nofollow">Introduction to Statistical Learning</a>, will differ from R<sup>2</sup> because <em>df<sub>e</sub></em> is <em>n - p - 1</em> whereas <em>df<sub>t</sub></em> is just <em>n - 1</em> and thus there is a difference of <em>1</em> when <em>p</em> equals <em>1</em> (i.e. only one explanatory variable) but the difference should be pretty small. <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" rel="nofollow">Here is a random example</a> which has a table of R<sup>2</sup> and adjusted-R<sup>2</sup> showing the difference even when the number of variables is 1.

Your difference is pretty large though. I suggest you look at you residual sum of squares and total sum of squares to see if you can calculate your own R<sup>2</sup> and adjusted-R<sup>2</sup> values and see if they match.