problem-10.1
problem-10.1
We begin by loading in the data set and looking at the names.
> library(MASS) # loads data set
For the model of highway mileage by horsepower we expect a negative
correlation. A scatterplot confirms this.
> plot(MPG.highway ~ Horsepower, data = Cars93)
> res = lm(MPG.highway ~ Horsepower, data = Cars93)
> res
Call:
lm(formula = MPG.highway ~ Horsepower, data = Cars93)
Coefficients:
(Intercept) Horsepower
38.150 -0.063
> predict(res, newdata=data.frame(Horsepower=225))
[1] 23.97
Modeling highway mileage by automobile weight should have a similar
negative correlation. Again we confirm and make the requested
predictions.
> f = MPG.highway ~ Weight
> plot(f, data=Cars93)
> res = lm(f, data=Cars93)
> res
Call:
lm(formula = f, data = Cars93)
Coefficients:
(Intercept) Weight
51.60137 -0.00733
> predict(res, newdata=data.frame(Weight=c(2524, 6400)))
1 2
33.108 4.708
The prediction for the MINI Cooper may be close, but there is no
reason to expect the prediction for the HUMMER to be close, as the
value of the predictor is outside the range of the data.
The variable Min.Price records the value of the stripped-down
version of the car, and Max.Price records the fully equipped
version. We'd expect that Max.Price would roughly be a fixed
amount more than Min.Price, as the differences-the cost of
leather seats, a bigger engine, perhaps- are roughly the same for
each car. Checking, we have:
> f = Max.Price ~ Min.Price
> plot(f, data=Cars93)
> res = lm(f,data=Cars93)
> abline(res)
> res
Call:
lm(formula = f, data = Cars93)
Coefficients:
(Intercept) Min.Price
2.31 1.14
The slope of 1.14 indicates that perhaps add-ons for more expensive
cars cost more, but in this case it appears to be due to the one large
outlier, as robust regression estimates are much closer to 1:
> rlm(f, data=Cars93)
Call:
rlm(formula = f, data = Cars93)
Converged in 7 iterations
Coefficients:
(Intercept) Min.Price
3.609 1.029
Degrees of freedom: 93 total; 91 residual
Scale estimate: 3.18
A scatterplot matrix may show additional linear relationships. These
are produced with the pairs() command, as in
pairs(Cars93). Doing so directly produces too many
scatterplots. We can trim down the size of the data frame then plot
again. Doing so using only the nonfactors can be done as follows:
> cars = Cars93[,sapply(Cars93, function(x) !is.factor(x))]
> pairs(cars)
Looking at the plots produced we see, for example, that variables 1
and 2, 2 and 3, 4 and 5, etc., are linearly related. These variables
can be identified from the graphic if the monitor is large enough, or
with the command names(cars).