Econometrics Project
Prepared for Module CB9016
“Applied Econometrics”
by
Carlos Ferreira
Submitted on the 16th March, 2009
1a)
Looking at the dataset first, we realise there's a variable that accounts for total production
(out) and a host of variables giving quantities of inputs used. We also note that the total capital
expenditure is not included as one variable, but as several (fert, fodd, mach and cap). Finally, the
variables age, soilc and soils are not continuous, suggesting their usage as dummies.
To examine the possible direction and magnitude of the impact of the regressand on the
regressor, we plotted the each of the variable pairs. The plots revealed two cases that are constantly
outliers. Even being roughly on line with the expected regression curves, the two largest farms
present very large outputs and very large usage of inputs when compared to the rest of the sample,
resulting in being over four standard errors beyond the mean. As a result, and at the risk of over-
reacting to a potentially small problem, we chose to eliminate these two cases from the analysis.
For the variable land, we expect a strong, positive and linear link with the output. The same
applies to the variable labour, but in this case we expect the coefficient to be higher than the one for
land. We also expect a large, positive relation between fertilizer and output. In the case of fodder,
the analysis of the plot shows that some farmers use it, while others don't.. This might result in a
pronounced heteroscedasticity if fodder was included as a stand-alone variable in the model. The
best way this variable could be used is in a composite total capital variable. In the case of
machinery, we expect a high positive relation with output as well.
We created a total capital variable, tc
tci = ferti + fodi + machi + capi
The variable tc accounts the total capital expenditure in the farm. Plotting tc against the
output suggests a strong, positive relation.
As for the variable age of the farmer, an analysis