Let us determine whether it is expedient to carry out this study taking into account the number of observations n and the number of factors m [4,8]. For such an estimate, we use the following relation:
(1)
In our case, 4 factors that affect the predicted parameter and 28 athletes observed with different factors are considered. In this way:
43 <169 (2)
Therefore, this study should be carried out expediently.
This method of assessing the feasibility of conducting a study is one of many possible and is not the only correct one. However, according to this assessment, the study is expedient, since inequality (2) is satisfied, which indicates a sufficient number of observations and factors to obtain reliable results. However, it should be remembered that this assessment does not take into account many other factors that may affect the appropriateness of the study, such as the relevance of the topic, the availability of alternative methods, etc.
Since it is assumed that the relationship between the prognostic and informative parameters is considered as linear, the values of the parameters are measured in strong scales (in interval scales), then to estimate the tightness of the coupling it is necessary to calculate the Pearson correlation coefficient for The cases of the connection y and x1, y and x2, y and x3, y and x4. The value of the correlation coefficient r is calculated by the formula
(3)
where xi is the i-th numerical value of the informative parameter; - mean value of the informative parameter; yi - the i-th numerical value of the predicted parameter; - the average value of the predicted parameter; n is the sample size.
To calculate the Pearson correlation coefficient for each pair of parameters, it is necessary to calculate the average values of the initial data, and then use formula (3) to calculate the correlation coefficient. The results of the correlation coefficients will allow you to determine which parameters have the greatest impact on the predicted parameter y. If the sample size is less than 100, then the following formula is used to correct the value of the Pearson correlation coefficient [13]:
(4)
where is the corrected numerical value of the Pearson correlation coefficient.
Thus, for each pair of bonds y and x1, y and x2, y and x3, y and x4, it is necessary to calculate the corrected value of the Pearson correlation coefficient using the formula above. Figure 1 shows the correlation field of points describing the dependence of the predicted parameter y on the informative parameter x1, in Fig. 2 - y from x2, in Fig. 3 - y from x3, in Fig. 4 - y from x4.
In table 2,3,4 and 5 are the calculated values for determining the correlation coefficients between the parameters x1, x2, x3, x4 and y.
Fig. 1. Correlation field of points characterizing the dependence of y on x1.
Fig. 1. Correlation field of points characterizing the dependence of y on x3.
Fig. 1. Correlation field of points characterizing the dependence of y on x4.
We substitute the calculated values in formula (3) and determine the values of the pair correlation coefficients for the relationship y and x1, y and x2, y and x3, y and x4. They will be equal to -0.8424, 0.7704, -0.7254 and 0.8073, respectively. After the introduction of the correction, the values of the sampling coefficients will be
, , , (5)
The value of the selective coefficients of the Pearson linear correlation indicates a strong coupling of the parameters. Since such a coefficient is a sample characteristic, it is necessary to evaluate its significance. Let's advance the null hypothesis h0, which consists in the absence of a linear correlation between the variables in the population (ƍ = 0). An alternative hypothesis of h1 is the assertion that the general correlation coefficient ƍ is different from zero (ƍ ≠ 0).
Since the sample size is 28, to test the hypothesis that there is no correlation between the parameters y and x1, y and x2, y and x3, y and x4, we use the Fisher transform [6]:
(6)
The hypothesis test consists in comparing the calculated value u with the critical value calculated by the formula
(7)
where is the quantile of the normalized distribution (for α = 0.05 = 1.96, for α = 0.01 = 2.576) .
Substituting the required values into formulas (6) and (7), we obtain for the relationship y and x1 u = 1.2212, for the relationship y and x2 u = 1.0203, for the relationship y and x3 u = 0.9287, for the relationship y and x4 u = 1.1270, u0.05 = 0.392, u0.01 = 0.5152.
In Figures 5, 6, 7 and 8 show the graphical representation of the values on the u axis for the y and x1, y and x2, y and x3, y and x4 relationships, respectively.
Fig. 1: Graphical representation of the values on the u-axis for the relationship y and x1.
Fig. 1. Graphical representation of the values on the u-axis for the relationship y and x2 Fig. 1. Graphical representation of values on the u-axis for the relationship between y and x3 Fig. 1. Graphical representation of the values on the u-axis for the relationship between y and x4
According to Fig. 5, 6, 7 and 8 we see that the calculated value falls into the critical region , i.e. hypothesis h1is not rejected, which means that the correlation between variables is considered significant.
To estimate the general coefficient of linear correlation of Pearson, it is necessary to calculate the boundaries of the confidence interval r1, r2, r3 and r4 by the formulas [4,8]
(8)
(9)
where , .
Substituting the necessary values into formulas (8) and (9), for the relationship between y and x1, we obtain r1 = 0.6800, r2 = 0.9236; for the relationship between y and x2, we obtain r1 = 0.5569, r2 = 0.8880; for the relationship between y and x3, we get r1 = 0.4905, r2 = 0.8670; for the relationship between y and x4, we obtain r1 = 0.6261, r2 = 0.9085.
Thus, with a confidence probability of 95%, the general coefficients of the Pearson linear correlation for the relationships y and x1, y and x2, y and x3, y and x4 lie within the boundaries
0.6800 <ρ <0.9236; (10)
0.5569 <ρ <0.88880; (11)
0.4905 <ρ <0.8670; (12)
0.6261 <ρ <0.9085; (13).
The values of the general correlation coefficient for the relationship y and x1 and y and x4 signal a moderate and strong correlation.
We determine whether it is expedient to use the fourth parameter as informative. For this it is necessary to determine the degree of connection between the specified parameters. If the relationship is strong (the correlation coefficient is more than 0.3, then one parameter with the highest correlation coefficient between y and x should be used, since the second informative parameter affects the predicted one. then in order to evaluate this interconnection, we find the value of the Pearson correlation coefficient using formula (3) and (4).
In Table 6 shows the calculated values for determining the correlation coefficient between the parameters x1 and x2.
Calculated parameters of athletes in the sample after the training experiment
Thus, substituting the calculated values in the formula (3), we obtain the value of the selective correlation coefficient for the interrelation of x1 and x2. It will be equal to 0.9934. After the introduction of the correction, the coefficient will be
(12)
Substituting the values in formula (6), we obtain u = 2,8553.
Let's advance the null hypothesis h0, which consists in the absence of a linear correlation between the variables in the population (ƍ = 0). An alternative hypothesis of h1 is the assertion that the general coefficient of correlation ƍ is different from zero (ƍ ≠ 0).
In Fig. 9 gives a graphical representation of the values on the u -axis for the interrelation of x1 and x2.
Fig. 9. Graphical representation of values on the u axis for the interrelation of x1 and x2 From the graph in Figure 9, you can really see that the value of u falls into the critical region |u|>u_0.01, which confirms the significance of the correlation between the variables. Next, the boundaries of the confidence interval for the correlation coefficient are calculated. For this, formulas (8) and (9) are used. Substituting the values, we obtain the boundaries of the confidence interval r1 = 0.9856 and r2 = 0.9970.
Thus, we can say that with a probability of 95%, the general coefficient of Pearson's linear correlation for the relationship x1 and x2 is in the range from 0.986 to 0.9970. This means that it can be assumed with a high degree of probability that there is a significant linear correlation between the variables x1 and x2.
Conclusion
If the results of the training experiment showed a strong correlation between the parameters x1 and x2, then it is possible that the parameters are multicollinear. This means that the parameters are highly correlated with each other, which can lead to model instability and poor prediction accuracy.
In this case, it is recommended to conduct additional data analysis to determine which of the parameters (x1 or x2) is more informative and necessary for forecasting. This can be done using feature selection techniques such as principal component analysis or recursive feature elimination.
In addition, if the goal of forecasting is to determine the causal relationship between parameters, then a deeper analysis should be carried out to establish which of the parameters is the cause and which is the effect. In such a case, using only one parameter for prediction may lead to incorrect conclusions.
Thus, when choosing informative parameters for forecasting, it is necessary to take into account many factors and conduct additional data analysis in order to select the most informative and necessary parameters..
References
S. N. Andreev, Mini-futbol: metodicheskoye posobiye,” Ares, 2004, 496-499 s.
Y.V. Verxoshanskiy, Programmirovaniye i organizatsiya trenirovochnogo protsessa, Fizkultura i sport, 1985, 175-181 s.
M.A. Godik, Sportivnaya metroogiya, Fizkultura i sport, 1988, 192-197 s.
T.N. Unguryanu and A.M. Grjibovskiy, Korrelyatsionniy analiz s ispolzovaniyem paketa statisticheskix program STATA, Ekologiya cheloveka, 2014, 60-64 s.
A.F. Grishin, Statisticheskiye modeli: postroyeniye, otsenka, analiz, Finansi i statistika, 2005, 416-422 s.
A. Petri and K. Sebin, Naglyadnaya statistika v meditsine. GEAOTAR-Med, 2003, 140-147 s.
O.Y. Rebrova, Statisticheskiy analiz meditsinskix dannix. Primeneniye paketa prikladnix program Statistika, MediaSfera, 2002, 312-319 s.
M.A. Xarchenko, Korrelyatsionniy analiz, IPTS VGU, 2008, 31-36 s.
V. Borovikov, Statistika. Iskusstvo analiza dannix na kompyutere: dlya professionalov, Piter, 2003, 413-418 s.
O.L. Erdonov, Control technology of Individual-Integrated Preparedness of High Qualification Competitors in Mini-Football at a Multi-Year Cycle, Düsseldorf, 2017, pp. 9-12.
V. L. Mutko, S. N. Andreyev and E. G. Aliyev, Mini-futbol - igra dlya vsex / V. L. Mutko, S. N. Andreyev, E. G. Aliyev. - M.: Sovetskiy sport, 2007. - 264 s.
IEEE conference templates contain guidance text for composing and formatting conference papers. Please ensure that all template text is removed from your conference paper prior to submission to the conference. Failure to remove template text from your paper may result in your paper not being published.