Time Series Analysis of RTA Boardings, Route 16 (2008-2015)
November 20, 2016
This is a condensed version of an undergrad project for my Time Series analysis class at UCR. Files can be found at the rta-routes repository on GitHub.
# Open TSA librarylibrary(TSA)# Set to directory where data files are# setwd("~/Documents/rta-routes/data/")# Import data files for routes 1 and 16; each file contains monthly numbers of ridersroute.16<-ts(read.table(file="data/route16.txt"),start=c(2008,7),frequency=12)
Although there are many bus routes, we are interested in Route 16 mainly due to the fact that it runs through UCR (University of California - Riverside). We create a time series data frame route.16 from the given .txt files. It contains monthly counts of how many riders take this bus route.
1
2
# Plot time seriesplot(route.16,ylab="Number of Riders",xlab="Year",main="Riders Per Month - Route 16",type="o")
The time series plot suggest we can apply a seasonal model since there is a pattern. It also shows that there is an increasing trend over time, especially between years 2008-2013.
Model Specification
Data suggests an ARIMA(p,q,d) model. In order to reach stationarity, we look at possible transformations through a Box-Cox transformation with a 95% confidence interval and λ= 0.5.
Transformation
The output suggests a square root transformation.
1
2
3
4
5
# Create subset of route 16 from Jan 2008 to Feb 2015route.16.window<-window(route.16,end=c(2015,2))# Use Box-Cox transformation to determine which power transf to use# BoxCox.ar(route.16.window)
Below is the plot of the transformed data:
1
plot(sqrt(route.16.window),ylab="Passengers",xlab="Years",main="Squareroot transformation of Riders per month line 16")
Differentiation
The plot of the transformed data still shows stationarity, so we plot the first degree difference of the transformation. In addition to the stationarity, the plot still shows an increasing trend.
1
2
# Differentiationplot(diff(sqrt(route.16.window),xlab="Years",ylab="First difference",main="First degree difference of the sqrt transformation"))
The Augmented Dickey-Fuller Test verifies the stationarity of the differentiated data with a p-value of 0.01. This value is sufficient to sustain the hypothesis that our data is stationary with a signficance level of α = 0.05.
1
2
3
4
5
6
7
8
9
# Testing Stationarityadf.test(diff(sqrt(route.16.window)))#### Augmented Dickey-Fuller Test#### data: diff(sqrt(route.16.window))## Dickey-Fuller = -6.8511, Lag order = 4, p-value = 0.01## alternative hypothesis: stationary
We can now proceed to find the appropriate ARMA(p,q) model by using the sample ACF and PACF with a lag of 12:
The Sample ACF resembles one of a seasonal MA(1) - it spikes at lag k = 1 and then drops below the margin of error. The Sample PACF also has a similar output.
Both figures suggest that an ARIMA(1,1,1) model is most suitable for our data. To confirm this, we can also look at the EACF:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Sample EACFeacf(as.vector(diff(sqrt(route.16.window),lag=12)))## AR/MA## 0 1 2 3 4 5 6 7 8 9 10 11 12 13## 0 x x o x x o o o o o o o o o## 1 x o o o o o o o o o o o o o## 2 x o o o o o o o o o o o o o## 3 x x x o o o o o x o o o o o## 4 x o o x o o o o o o o o o o## 5 x o o x o o o o o o o o o o## 6 o x o o o o o o o o o o o o## 7 o x o o x o o o o o o o o o
The tip of the EACF is reached at the point (1,1), creating a wedge from there. In addition to the ARIMA(1,1,1) model, another suggested model for the data is an ARI(1,1) model since the sample ACF does not immediately drop below the margin of error. Different results can be obtained if the MA(1) of the ARIMA model is left out.
Model Fitting
Since we want to estimate how many people ride the bus each month, we will test two different approaches of our model, one with seasonality and one without it.
With ML estimates as \( \sigma^2_{e} \)≈ 315.9, \( \hat\phi \) = 0.3146, \( \hat\theta \) = -0.8660. The approximate 95% confidence intervals for \( \hat\phi \) and \( \hat\theta \) is (0.065872, 0.543328) and (-0.961452,-0.770548), respectively. We found that \( \hat\phi \) and \( \hat\theta \)are significantly different from 0.
#ARI(1,1)par(mfrow=c(2,2))plot(route.16,ylab="Number of Riders",xlab="Year",main="Riders Per Month -Route 16",type="o")plot(rstandard(ari11.ns.fit),ylab="Std. Residuals",xlab="Time",type="o")abline(h=0)hist(rstandard(ari11.ns.fit),xlab="Std. Residuals",main="")qqnorm(rstandard(ari11.ns.fit),main="")qqline(rstandard(ari11.ns.fit),main="")
#ARIMA(1,1,1)par(mfrow=c(2,2))plot(route.16,ylab="Number of Riders",xlab="Year",main="Riders Per Month - Route 16",type="o")plot(rstandard(arima111.ns.fit),ylab="Std. Residuals",xlab="Time",type="o")abline(h=0)hist(rstandard(arima111.ns.fit),xlab="Std. Residuals",main="")qqnorm(rstandard(arima111.ns.fit),main="")qqline(rstandard(arima111.ns.fit),main="")
#ARI(1,1)par(mfrow=c(2,2))plot(route.16,ylab="Number of Riders",xlab="Year",main="Riders Per Month -Route 16",type="o")plot(rstandard(ari11.fit),ylab="Std. Residuals",xlab="Time",type="o")abline(h=0)hist(rstandard(ari11.fit),xlab="Std. Residuals",main="")qqnorm(rstandard(ari11.fit),main="")qqline(rstandard(ari11.fit),main="")
To test our model, we withold observations from February 2015 to February 2016. The table below summarizes the actual values in March 2015 and January 2016 and the predictions estimated by the four ARIMA and ARI models.