vuoi
o PayPal
tutte le volte che vuoi
Analysis of Cross-Sectional Data
Since we know how to deal only with cross-sectional data, we took into consideration only one year, so our result will be well analyzed, but in real terms they could return biased.
4. The data have been downloaded from AIDA, they refer to Italian companies and starting from this we selected the listed ones such as 407 companies, then we filtered for the ones who made an acquisition in 2020. After we added all the explanatory variables described at the beginning of this paper. Hence, our dataset is based on only one year, whereas we should have used at least 5 years. This choice may lead to some limitations in our analysis because we are not considering historical data.
5. At the beginning we dropped plenty of companies due to the fact that they had many missing data. After a cleaning phase we remained with:
- 260 rows
- 20 columns
Missing data:
- Ros19, Ros20: we have 1 missing data for each of those indexes
- DR20: 51 missing values
- DR19: 36 missing values
Here we have a descriptive table with all our variables.
We have 1 string variable, 17 numeric variables and 2 dummy variables.
Variable | Obs | Mean | Std. dev. | Min | Max | ||
---|---|---|---|---|---|---|---|
Ragionesoc~e | 0 | Employees | 260 | 2433.042 | 10886.12 | 5 | 113847 |
Roe20 | 260 | 5.0925 | 16.32248 | -124.63 | 54.59 | ||
Roe19 | 260 | 8.507231 | 17.12697 | -127.7 | 70.95 | ||
Roi20 | 260 | 3.140462 | 7.891158 | -19.74 | 29.8 | ||
Roi19 | 260 | 6.065038 | 8.368215 | -17.53 | 29.48 | ||
ER20 | 260 | 9.205962 | 13.30042 | -38.96 | 76.54 | ||
ER19 | 260 | 11.72862 | 12.55022 | -34.43 | 78.64 | ||
Ros20 | 259 | .8579151 | 12.41545 | -47.24 | 28.26 | ||
Ros19 | 259 | 4.810502 | 10.71554 | -44.14 | 27.18 | ||
IR20 | 260 | 3.939077 | 8.177715 | 0 | 62.24 | ||
IR19 | 260 | 4.111115 | 11.46785 | 0 | 84.53 | ||
DFF20 | 260 | .9786923 | .9608936 | .07 | 7.86 | ||
DFF19 | 260 | 1.205846 | 1.727082 | .04 | 15.3 | ||
IDL20 | 260 | .4228077 | .2108418 | 0 | .93 | ||
IDL19 | 260 | .3487308 | .2075476 | 0 |
.9DR20 | 209 36.15895 24.43991 0 95.59
DR19 | 224 26.53701 20.48292 0 96.86
M | 260 .4769231 .5004305 0 1
D | 260 .4846154 .5007271 0 1
Now we could perform a Skewtest so see if the 2020 data are normally distributed.
sktest Roe20 Roi20 ER20 Ros20 IR20 DFF20 IDL20 DR20
Skewness and kurtosis tests for normality ----- Joint test-----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
Roe20 | 260 0.0000 0.0000 138.34 0.0000
Roi20 | 260 0.0001 0.0017 21.31 0.0000
ER20 | 260 0.0128 0.0000 31.14 0.0000
Ros20 | 259 0.0000 0.0000 46.71 0.0000
IR20 | 260 0.0000 0.0000 182.22 0.0000
DFF20 | 260 0.0000 0.0000 148.23 0.0000
IDL20 | 260 0.3726 0.0617 4.31 0.1157
DR20 | 209 0.0301 0.0003 15.14 0.0005
As we can see the only one variable who passed the Jaque-Bera test is IDL20, while all the other are not normally distributed so we will consider to use a Logit model instead of a Probit model since it uses a latent variable
distribution.We performed the same test on the 2019 data and here we have not anynormally distributed variables.Skewness and kurtosis tests for normality ----- Joint test----- Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2)Prob>chi2-------------+-----------------------------------------------------------------Roe19 | 260 0.0000 0.0000 149.020.0000 Roi19 | 260 0.0506 0.1183 6.120.0469 ER19 | 260 0.0016 0.0000 33.860.0000 Ros19 | 259 0.0000 0.0000 57.540.0000 IR19 | 260 0.0000 0.0000 213.640.0000 DFF19 | 260 0.0000 0.0000 216.110.0000 IDL19 | 260 0.0001 0.8270 13.970.0009 DR19 | 224 0.0000 0.5859 17.260.0002reg M Roe20 Roi20 ER20 Ros20 IR20 DFF20 IDL20 DR20 DSource | SS df MS Number of obs =209-------------+---------------------------------- F(9, 199) =16.08 Model | 21.934949 9 2.43721656 Prob > F =0.0000Residual | 30.1703142 199 .151609619 R-squared =0.4210-------------+---------------------------------- Adj R-squared =0.3948 Total | 52.1052632 208 .250506073 Root MSE =
Variable | Obs | Mean | Std. dev. | Min | Max |
---|---|---|---|---|---|
fitted | 209 | .4736842 | .3247406 |
Variable | Coefficient | Std. err. | t | P>|t| | [95% conf.interval] |
---|---|---|---|---|---|
Roe20 | .0042185 | .0023168 | 1.82 | 0.070 | -.0003502 .0087872 |
Roi20 | -.0140183 | .0074158 | -1.89 | 0.060 | -.0286419 .0006052 |
ER20 | .0106463 | .0053722 | 1.98 | 0.049 | .0000526 .02124 |
Ros20 | -.0073206 | .007593 | -0.96 | 0.336 | -.0222937 .0076526 |
IR20 | .0036319 | .0095735 | 0.38 | 0.705 | -.0152466 .0225104 |
DFF20 | -.072976 | .0300222 | -2.43 | 0.016 | -.1321785-.0137736 |
IDL20 | -.0483297 | .1853279 | -0.26 | 0.795 | -.4137883 .317129 |
DR20 | .000319 | .00136 | 0.23 | 0.815 | -.0023628 .0030008 |
D | -.672862 | .0576375 | -11.67 | 0.000 | -.7865208-.5592033 |
_cons | .8217176 | .1007959 | 8.15 | 0.000 | .62295241.020483 |
By computing a regression, we note that predicted values are included between -.1843037 and 1.148549, however our dependent variable is a dummy hence values can only be 0 or 1
Variable | VIF | 1/VIF |
---|---|---|
Ros20 | 7.77 | 0.128722 |
Roi20 | 5.30 | 0.188529 |
ER20 | 4.55 | 0.219947 |
Roe20 | 2.23 | 0.449383 |
IDL20 | 1.52 | 0.656607 |
DR20 | 1.52 | 0.659787 |
IR20 | 1.28 | 0.782229 |
DFF20 | 1.23 | 0.816258 |
D | 1.14 | 0.875056 |
Mean VIF | 2.95 |
Now we are going to compute the logit model since we saw a non-normal distribution
logit M Roe20 Roi20 ER20 Ros20 IR20 DFF20 IDL20 DR20 D
Iteration 0: log likelihood = -144.57815
Iteration 1: log likelihood = -95.14023
Iteration 2: log likelihood = -94.502128
Iteration 3: log likelihood = -94.500333
Iteration 4: log likelihood = -94.500333
Logistic regression Number of obs = 209
LR chi2(9) = 100.16
Prob > chi2 = 0.0000
Log likelihood = -94.500333
Pseudo R2 =
0.3464 ------------------------------------------------------------------------------ M | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Roe20 | .0267739 .0162373 1.65 0.099 -.0050505 .0585984 Roi20 | -.0867751 .0500846 -1.73 0.083 -.1849392 .011389 ER20 | .073364 .0397563 1.85 0.065 -.0045568 .1512849 Ros20 | -.0579823 .0541187 -1.07 0.284 -.164053 .0480884 IR20 | .0216559 .0637279 0.34 0.734 -.1032485 .1465603 DFF20 | -.4237106 .203156 -2.09 0.037 -.821889 -.0255322 IDL20 | -.3802979 1.232062 -0.31 0.758 -2.795095 2.034499 DR20 | .0031268 .0095513 0.33 0.743 -.0155935 .0218471 D | -3.563182 .4509901 -7.90 0.000 -4.447106 -2.679257 _cons | 1.609232 .6845157 2.35 0.019 .2676055 2.950858 ------------------------------------------------------------------------------ This is the full model, before starting with the stepwise we would like to investigate the correlation among our variables by computing a pwcorr.
This is an exploratory analysis, we will accept as maximum significance level a 10% value
pwcorr Roe20 Roi20 ER20 Ros20 IR20 DFF20 IDL20 DR20, star(0.05)
| Roe20 Roi20 ER20 Ros20 IR20 DFF20 IDL20 DR20
---------------------------------------------------------------
Roe20 | 1.0000
Roi20 | 0.6648* 1.0000
ER20 | 0.3218* 0.5454* 1.0000
Ros20 | 0.4382* 0.7428* 0.8284* 1.0000
IR20 | -0.0220 -0.1578* -0.2130* -0.3128* 1.0000
DFF20 | 0.0678 0.1305* 0.0264 0.0706 -0.1435* 1.0000
IDL20 | 0.0461 -0.2285* 0.0307 -0.1898* 0.4499* -0.2233* 1.0000
DR20 | -0.1025 -0.2856* -0.0458 -0.1308 -0.1123 -0.3275* 0.3993*
| DR20
-------------+---------
DR20 | 1.0000
Since we see ROI is highly correlated with all other variables, now we could try to estimate the logit model without it.
We tried the stepwise process, so we start to drop every time the variable with highest p-value and then we re-estimated the model, by doing this we found that there was not even one significant model.
Hence, we decided to restart the same
process by keeping the Roi20 fixed and in the end, we obtained this kind of output:
logit M Roi20 DFF20 D Iteration 0: log likelihood = -179.94125 Iteration 1: log likelihood = -115.134 Iteration 2: log likelihood = -114.37502 Iteration 3: log likelihood = -114.37221 Iteration 4: log likelihood = -114.37221 Logistic regression Number of obs = 260 LR chi2(3) = 131.14 Prob > chi2 = 0.0000 Log likelihood = -114.37221 Pseudo R2 = 0.3644 ------------------------------------------------------------------------------ M | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Roi20 | -.056127 .0217583 -2.58 0.010 -.0987725 -.0134815 DFF20 | -.3322477 .1618238 -2.05 0.040 -.6494165 -.0150789 D | -3.621511 .3896568 -9.29 0.000 -4.385225 -2.857798 _cons | 2.097647 .3477647 6.03 0.000 1.41604 2.779253 ------------------------------------------------------------------------------
This is the final result, the most significant variables
are Roi20, DFF20, and D; we also checkedagain the pwcorr and we found out that there are no multicollinearity problems.
The Pseudo R2 equal to 0.3644 means that the model explains the 36,44% of the unexplainedvariance.
linktest
Iteration 0: log likelihood = -179.94125
Iteration 1: log likelihood = -114.61471
Iteration 2: log likelihood = -114.3718
Iteration 3: log likelihood = -114.37016
Iteration 4: log likelihood = -114.37016
Logistic regression Number of obs = 260
LR chi2(2) = 131.14
Prob > chi2 = 0.0000
Log likelihood = -114.37016
Pseudo R2 = 0.3644
------------------------------------------------------------------------------
M | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_hat | 1.00172 .1097661 9.13 0.000 .7865822 1.216857
_hatsq | .0072008 .112424 0.06 0.949 -.2131462 .2275479
_cons | -.0178934 .3258329 -0.05 0.956 -.6565142
.6207275
The variable _hat should be a statistically significant predictor since it is the predicted value from the model. This will be the case unless the model is completely misspecified.
We will now investigate marginal effects with mfx compute:
- mfx compute at mean
Marginal effects after logit y = Pr(M) (predict) = .46037663
variable | dy/dx Std. err. z P>|z| [ 95% C.I. ] X