# Serial Correlation

- 2 Specify the Multiple Regression with Serial Correlation procedure options. Find and open the Multiple Regression with Serial Correlation procedure using the menus or the Procedure Navigator. The settings for this example are listed below and are stored in the Example 1 settings template.
- Testing for Serial Correlation The above discussion suggests a very simple strategy for testing for serial correlation: check the magnitude and significance level of your estimated. Economists that deal with time-series data often prefer the sophisticated-yet-unintuitive.

### Compute Sample ACF and PACF

Serial Correlation and a Lagged Dependent Variable Consider the simple situation in which we have (although the result holds for more complicated models) 0 1 1, t t t y y (6) and while t satisfies all of our standard classical linear regression model assumptions, we have serial correlation, that is, 1 cov(,) 0 t t.

This example shows how to compute the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) to qualitatively assess autocorrelation.

The time series is 57 consecutive days of overshorts from a gasoline tank in Colorado.

**Step 1. Load the data.**

Load the time series of overshorts.

The series appears to be stationary.

**Step 2. Plot the sample ACF and PACF.**

Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF).

The sample ACF and PACF exhibit significant autocorrelation. The sample ACF has significant autocorrelation at lag 1. The sample PACF has significant autocorrelation at lags 1, 3, and 4.

The distinct cutoff of the ACF combined with the more gradual decay of the PACF suggests an MA(1) model might be appropriate for this data.

**Step 3. Store the sample ACF and PACF values.**

Store the sample ACF and PACF values up to lag 15.

The outputs `acf`

and `pacf`

are vectors storing the sample autocorrelation and partial autocorrelation at lags 0, 1,...,15 (a total of 16 lags).

Here is a simple trick that can solve a lot of problems.

You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here.

*Requirements for linear regression*

A radically different and much simpler approach is to re-shuffle the observations, randomly. If it does not take care of the issue (auto-correlations are weakened but still remain significant, after re-shuffling) it means that there is something fundamentally wrong about the data set, perhaps with the way the data was collected. In that case, cleaning the data or getting new data is the solution. But usually, re-shuffling - if done randomly - will eliminate these pesky correlations.

**The trick**

Reshuffling is done as follows:

- Add one column to your data set, consisting of pseudo random numbers, for instance generated with the function RAND in Excel.
- Sort the entire data set (all the columns, plus the new column containing the pseudo random numbers) according to the values in the newly added column.

Then do the regression again, and look at improvements in model performance. R-squared may not be a good indicator, but techniques based on cross-validation should be used instead.

### Serial Correlation Error

Actually, any regression technique where the order of the observations does not matter, will not be sensitive to these auto-correlations. If you want to stick to standard, matrix-based regression techniques, then re-shuffling all your observations 10 times (to generate 10 new data sets, each one with the same observations but ordered in a different way) is the solution. Then you will end up with 10 different sets of estimates and predictors: one for each data set. You can compare them; if they differ significantly, there is something wrong in your data, unless auto-correlations are expected, as in time series models (in that case, you might want to use different techniques anyway, for instance techniques adapted to time series, see here.).

### Serial Correlation Is Quizlet

**Testing for auto-correlations in the observations**

### Negative Serial Correlation Emh

If you have *n* observations and *p* variables, there is no global auto-correlation coefficient that measures the association between one observation and the next one. One way to do it is to compute it for each variable (column) separately. This will give you *p* lag-1 auto-correlation coefficients. Then you can look at the minimum (is it high in absolute value?) or the maximum (in absolute value) among these *p* coefficients. You can also check lag-2, lag-3 auto-correlations and so on. While auto-correlation between observations is not the same as auto-correlation between residuals, they are linked, and it is still a useful indicator of the quality of your data. For instance, if the data comes from sampling and consists of successive blocks of observations, each block corresponding to a segment, then you are likely to find auto-correlations, both in the observations and the residuals. Or if there is a data glitch and some observations are duplicated, you can experience the same issue.

*To not miss this type of content in the future,subscribeto our newsletter. For related articles from the same author, click hereor visitwww.VincentGranville.com. Follow me onon LinkedIn, or visit my old web pagehere.*