Country Risk in International Investment Its’ structure and methods of estimation

in this paper, the issues of country risk assessment in economic security and sustainability context are investigated. The main object of research is country risk and its structural components. The scientific paper’s main goal is to analyze valuation methods of country risk from different perspectives and suggest a model for country risk measurement that allows adequately evaluate country risk, economic security and economic sustainability level and dynamics, including structural components and their relationships. The paper approaches several main tasks. First, to highlight the importance of country risk evaluation and its assessment in growing global markets, analyzing causes and elements of country risk based on other scientific researches. Second, to explore and clarify advantages and disadvantages of the methods of country’s risk assessment, as well as to investigate sources of country risk and ways how to manage the risk. Third, to apply quantitative and qualitative methods for analysis, formulate, create and present the model of country’s risk assessment in economic security and sustainability context, which will identify factors, influencing country risk and determine their direct and indirect relationship between each other. The last task is to verify practical suitability of the model of country’s risk assessment by performing empirical analysis all over the world, identifying directions for mitigating risk effects.

ach business operation causes some kind of risk. When business operations occur in international dimension, they bring additional risks, which are not typical for domestic operations. These additional risks are called country risks and usually include risks arising from a variety of national differences in policies, geography, economic structures, socio-political institutions, and currencies. Country risk analysis (CRA) tries to solve this problem by identifying the potential for these risks to decrease the expected return of crossborder investments.
Concept of "Country risk" began to be widely used in the 1970s. It was originally more professionally oriented in the sense that it aimed at addressing the concrete issue of a particular business in a particular country and was generally used by the banking industry.
Every year it becomes more and more difficult to analyze and predict changes in the financial, economic and political sectors of business. The importance of country risk analysis is now more understandable and potential for it is growing by establishing more and more country risk rating agencies, which combine a wide range of qualitative and quantitative information regarding alternative measures of economic, financial, and political risk into associated composite risk ratings. However, the accuracy of any rating agency with regard to any or all of these measures is open to question.
Globalization, after undermining the old definition of economic security, is found at the centre of a new definition that emphasizes the risks of unexpected shocks and economic volatility. The new definition must capture the causal consequences of globalization accurately and establish explicit benchmarks for assessing globalization's effects on economic security and country's economic sustainability.
Country risk assessment has been analyzed by different authors but in quite narrow way, in this paper the concept of country risk and influencing factors are presented in an extended view.
Following new scientific novelties in economics were discovered: Expanded and consolidated overview of analyzes of country risk concept, its components and arising problems were analyzed in another angle which allowed to identify new possibilities and challenges for creating new model for assessment of country risk.
Broader analysis of country risk-includes not only political risk, but as well socio-economical aspects, presents clear and analyzed new concept which was not assumed in previous researches.
There are many studies related to country risk, its financial integration in a country, the impact on economics, and other aspects of country's welfare. To summarize the analysis of scientific literature about country risk, it is obvious that researchers are analyzing country risk approach only partially, not adapting the concept to growing globalization topic, which definitely makes changes in country risk approach. Country risk concept should be analyzed and understand in a broader way. This updated approach I will discuss further.

Credit Ratings
Credit rating agencies (CRAs) play a key role in financial markets by helping to reduce the informative asymmetry between lenders and investors, on one side, and issuers on the other side, about the creditworthiness of companies or countries. CRAs' role has expanded with financial globalization and has received an additional boost from Basel II that incorporates the ratings of CRAs into the rules for setting weights for credit risk. Ratings tend to be sticky, lagging markets, and overreact when they do change. This overreaction may have aggravated financial crises in the recent past, contributing to financial instability and cross-country contagion.
A credit rating is a current opinion and measure of the risk of an obligor with respect to a specific financial obligation based on all available information. For this purpose, S&P and Fitch define risk as the probability of default (PD), whereas Moody's define it as 'loss'.
The logic underlying the existence of CRAs is to solve the problem of the informative asymmetry between lenders and borrowers regarding the creditworthiness of the latter. Issuers with lower credit ratings pay higher interest rates embodying larger risk premiums than higher rated issuers. Moreover, ratings determine the eligibility of debt and other financial instruments for the portfolios of certain institutional investors due to national regulations that restrict investment in speculative-grade bonds.
Standard and Poor's ratings seek to capture only the forward-looking probability of the occurrence of default. They provide no assessment of the expected time of default or mode of default resolution and recovery values.
By contrast, Moody's ratings focus on the Expected Loss (EL) that is a function of both Probability of Default (PD) and the expected Recovery Rate (RE). Thus EL = PD (1 -RE).
Fitch's ratings also focus on both PD and RE. They have a more explicitly hybrid character in that analysts are also reminded to be forwardlooking and to be alert to possible discontinuities between past track records and future trends.

Models Used
A variety of statistical methods was employed to estimate models of debt rescheduling in the studies cited above. Since most authors chose their dependent variable to be a discrete binary variable, which took on the value one when a country 'rescheduled' within a given time-period and zero otherwise, the statistical methods used have been those designed for dichotomous dependent variables. These methods include discriminant analysis, linear-probability, probit, and logit models. In this section, we briefly describe each of these methods and then discuss criteria to use when choosing among them. Perforce, our discussion will be brief.
The methods used by the banks and other agencies for country risk analysis can broadly be classified as qualitative or quantitative. However, many agencies amalgamate both qualitative and quantitative information into a single index or rating. The data was collected from various sources that include expert panel, survey, staff analysis, and published data sources. The country risk index could be either ordinal or scalar. A survey conducted by the US Export-Import Bank in 1976 categorized various methods of country risk appraisal used mainly by the banks into one of four types: (1) full qualitative method, (2) structured qualitative method, (3) checklist method, and (4) other quantitative method. Since our focus in this paper is on quantitative methods, we will only briefly discuss the other three categories.
Discriminant analysis. Discriminant analysis finds a set of prediction equations based on independent variables that are used to classify individuals into groups. There are two possible objectives in a discriminant analysis: finding a predictive equation for classifying new individuals or interpreting the predictive equation to better understand the relationships that may exist among the variables.
Objectives. The main objectives of discriminant analisys are: Development of discriminant functions. Examination of whether significant differences exist among the groups, in terms of the predictor variables.
Determination of which predictor variables contribute to most of the intergroup differences.
Evaluation of the accuracy of classification.

Discriminant analysis linear equation.
A involves the determination of a linear equation like regression that will predict which group the case belongs to. The form of the equation or function is: where D = discriminate function v = the discriminant coefficient or weight for that variable X = respondent's score for that variable a = a constant i = the number of predictor variables.
Assumptions of discriminant analysis. The main oassumptions of discriminant analisys are: The observations are a random sample. Each predictor variable is normally distributed.
Each of the allocations for the dependent categories in the initial classification are correctly classified.
There must be at least two groups or categories, with each case belonging to only one group so that the groups are mutually exclusive and collectively exhaustive (all cases can be placed in a group).
Each group or category must be well defined, clearly differentiated from any other group(s) and natural.
The groups or categories should be defined before collecting the data.
The attribute(s) used to separate the groups should discriminate quite clearly between the groups so That group or category overlap is clearly non-existent or minimal.
Group sizes of the dependent should not be grossly different and should be at least five times the number of independent variables. K-nearest neighbours ( k-NN ) algorithm. The k-NN (k-nearest neighbours) algorithm is a classification algorithm that can apply to question classification. However, its time complexity will increase linearly with the increase of training set size, which constrains the actual application effects of this algorithm. Simply stated, k-NN is an algorithm that classifies the new cases based on similarity measures or distance measures of pair of observations such as euclidean, cosine, etc. k-NN algorithm is a lazy learner i. e. it does not learn anything from the training tuples and simply uses the training data itself for classification. It is a non-parametric method used for classification and regression. Different types of prediction using data mining techniques are: (1) Classification: predicting into what category or class a case falls.
(2) Regression: predicting what number value a variable will have (if a variable varies with time, it is called 'time series' prediction).
Classification problems aim to identify the characteristics that indicate the group to which each case belongs. This pattern can be used both to understand the existing data and to predict how new instances will behave. Data mining creates classification models by examining already classified data (cases) and inductively finding a predictive pattern. k-NN for classification. In pattern recognition, the k-NN algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. Figure 2 shows the k-NN decision rule for K = 1 and K = 4 for a set of samples divided into 2 classes. In Figure 2(a), an unknown sample is classified by using only one known sample; in Figure 2(b) more than one known sample is used. In the last case, the parameter K is set to 4, so that the closest four samples are considered for classifying the unknown one. Three of them belong to the same class, whereas only one belongs to the other class. In both cases, the unknown sample is classified as belonging to the class on the left. Distance Metric. As mentioned before KNN makes predictions based on the outcome of the K neighbors closest to that point. Therefore, to make predictions with KNN, we need to define a metric for measuring the distance between the query point and cases from the examples sample. One of the most popular choices to measure this distance is known as Euclidean. Other measures include Euclidean squared, City-block, and Chebychev.
, , x p where x and p are the query point and a case from the examples sample, respectively.
The advantages of the k-NN method are as follows: 1. Analytically tractable. 2. Simple implementation.
3. Nearly optimal in the large sample limit ( → ∞). 4. Uses local information, which can yield highly adaptive behaviour.
5. Lends itself very easily to parallel implementations.
The drawbacks of the k-NN method are as follows: 1. k-NN algorithm is that it is a lazy learner, i. e. it simply uses the training data itself for classification.
2. Result of this is that the method does not learn anything from the training data, which can result in the algorithm not generalizing well. Further, changing K can change the resulting predicted class label.
3. Also algorithm may not be robust to noisy data.
4. To predict the label of a new instance the kNN algorithm will find the K closest neighbours to the new instance from the training data, the predicted class label will then be set as the most common label among the K closest neighbouring points.
5. The algorithm needs to compute the distance and sort all the cases at each prediction, which can be slow if there are a large number of training examples.
Classification and regression trees (CART) method. CART is one method of machine learn- ing which the exploration method is done by decision tree technique. The method developed by Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone is classification technique by using binary recursive partition algorithm.
Classification tree methods such as CART are convenient way to produce a prediction rule from a set of observations described in terms of a vector of features and a response value. The aim is to define a general prediction rule, which can be used to assign a response value to the cases solely on the bases of their predictor (explanatory) variables. Tree-structured classifications are not based on assumptions of normality and user-specified model statements, as are some conventional methods such as discriminant analysis and ordinary least square regression.
Tree based decision methods are statistical systems that mine data to predict or classify future observations based on a set of decision rules and are sometimes called rule induction methods because the reasoning process behind them is clearly evident when browsing the trees.
In CART, the observations are successively separated into two subsets based on associated variables significantly related to the response variable; this approach has an advantage of providing easily comprehensible decision strategies.
CART can be applied either as a classification tree or as a regressive tree depending on whether the response variable is categorical or continuous. Tree based methods are not based on any stringent assumptions. These methods can handle large number of variables, are resistant to outliers, non-parametric, more versatile, can handle categorical variables, though computationally more intensive.
CART methodology. For building decision trees, CART uses so-called learning set-a set of historical data with pre-assigned classes for all observations. An algorithm known as recursive partitioning is the key to the nonparametric statistical method of CART. It is a step-by-step process by which a decision tree is constructed by either splitting or not splitting each node on the tree into two daughter nodes.
An attractive feature of the CART methodology is that because the algorithm asks a sequence of hierarchical questions, it is relatively simple to understand and interpret the results. The unique starting point of a classification tree is called a root node and consists of the entire learning set L at the top of the tree.
Steps in Cart. CART analysis consists of four basic steps. The first step consists of tree building, during which a tree is built using recursive splitting of nodes. Each resulting node is assigned a predicted class, based on the distribution of classes in the learning dataset which would occur in that node and the decision cost matrix. The assignment of a predicted class to each node occurs whether or not that node is subsequently split into child nodes.
The second step consists of stopping the tree building process. At this point a "maximal" tree has been produced, which probably greatly overfits the information contained within the learning dataset.
The third step consists of tree "pruning," which results in the creation of a sequence of simpler and simpler trees, through the cutting off of increasingly important nodes.
The fourth step consists of optimal tree selection, during which the tree which fits the information in the learning dataset, but does not overfit the information, is selected from among the sequence of pruned trees. Each of these steps will be discussed in more detail below.
The tree-building process starts by partitioning a sample or the root node into binary nodes based upon a very simple question of the form is X ≤ d?, where X is a variable in the data set and d is a real number. Initially, all observations are placed in the root node. This node is impure or heterogenous because it contains observations of mixed classes. The goal is to devise a rule that will break up these observations and create groups or binary nodes that are internally more homogenous than the root node. Starting from the root node, and using, for example, the Gini diversity index as a splitting rule, the tree building process is as follows: 1. CART splits the first variable at all of its possible split points (at all of the values the variable assumes in the sample). At each possible split point of a variable, the sample splits into binary or two child nodes. Cases with a "yes" response to the question posed are sent to the left node and those with "no" responses are sent to the right node.
2. CART then applies its goodness-of-split criteria to each split point and evaluates the reduction in impurity that is achieved using the formula 3. CART selects the best split of the variable as that split for which the reduction in impurity is highest.
4. Steps 1-3 are repeated for each of the remaining variables at the root node.
5. CART then ranks all of the best splits on each variable according to the reduction in impurity achieved by each split.
6. It selects the variable and its split point that most reduced the impurity of the root or parent node. 7. CART then assigns classes to these nodes according to the rule that minimizes misclassification costs. CART has a built-in algorithm that takes into account user-defined variable misclassification costs during the splitting process. The default is unit or equal misclassification costs.
8. Because the CART procedure is recursive, steps 1-7 are repeatedly applied to each nonterminal child node at each successive stage.
9. CART continues the splitting process and builds a large tree. The largest tree is built if the splitting process continues until every observation constitutes a term in al node. Obviously, such a tree will have a large number of terminal nodes, which will be either pure or have very few cases (less than 10).
Summing up CART's strengths and weaknesses: 1. CART makes no distributional assumptions of any kind for dependent and independent variables. No variable in CART is assumed to follow any kind of statistical distribution.
2. The explanatory variables in CART can be a mixture of categorical and continuous.
3. CART has a built-in algorithm to deal with the missing values of a variable for a case, except when a linear combination of variables is used as a splitting rule.
4. CART is not at all affected by the outliers, collinearities, heteroskedasticity, or distributional error structures that affect parametric procedures. Outliers are isolated into a node and thus have no effect on splitting. Contrary to situations in parametric modeling, CART makes use of collinear variables in "surrogate" splits.
5. CART has the ability to detect and reveal variable interactions in the dataset. 6. CART does not vary under a monotone transformation of independent variables; that is, the transformation of explanatory variables to logarithms or squares or square roots has no effect on the tree produced.
7. In the absence of a theory that could guide a researcher, in a famine vulnerability study, for example, CART can be viewed as an exploratory, analytical tool. The results can reveal many important clues about the underlying structure of famine vulnerability.
8. CART's major advantage is that it deals effectively with large datasets and the issues of higher dimensionality; that is, it can produce useful results from a large number of variables submitted for analysis by using only a few import ant variables.
9. The inverted-tree-structure results generated from CART analysis are easy for anyone to understand in any discipline.
However, CART analysis does have some limitations. CART is a blunt instrument compared to many other statistical and analytical techniques. At each stage, the subdivision of data into two groups is based on only one value of only one of the potential explanatory variables. If a statistical model that appears to fit the data exists, and if its' basic assumptions appear to be satisfied, that model would be preferable, in general, to a CART tree.
A weakness of the CART method and, hence, of the conclusions it may yield is that it is not based on a probabilistic model. There is no probability level or confidence interval associated with predictions derived from a CART tree that could help classify a new set of data. The confidence that an analyst can have in the accuracy of the results produced by a given CART tree is based purely on that tree's historical accuracy -how well it has predicted the desired response in other, similar circumstances.
SVM model. Support vector machine (SVM) method which is widely used in the research area of artificial intelligence, is based on the theory of statistical learning which is concluded by combining VC dimension theory with principle of risk minimum. It is based on complexity and learning ability of the model to seek the best compromise, and finally get the best generalization ability.
Classification algorithm of SVM. SVM, which is under the condition of linear separable case, is developed from the optimal separating hyperplane. In Figure 3, we show the basic principle of SVM. On the diagram, solid and hollow points on behalf of the two kinds of samples, H is the classification hyperplane, H1 is sample that is most close to H and H2 is hyperplanes which is parallel to H. They have equal distance to H, the distance between them is termed classification interval. The hyperplane is the one that separates the two types with biggest distance, for such an ill-posed problem of classifying two kinds of samples, the optimal separating hyperplane has maximum stability and high generalization ability.
Steps involved in the design of SVM are as follows: 1. Hyperplane acting as the decision surface is defined as represents the inner product of two vectors induced in the feature space by the input vector x and input pattern xi pertaining to the example. This term is referred to as inner-product kernel. Where Subject to the following constraints: 4. The linear weight vector w 0 corresponding to the optimum values of the Lagrange multipliers was determined using the following formula: is the image induced in the feature space due to x i ; w 0 represents the optimum bias b 0 .

Application of the Building Models Process to the Countries' Conditions
Data source. The statistical data of the economic and financial variables considered in this paper come from the following sources: International Monetary Fund (World Economic Outlook database) -http://www.imf.org/external/ index.htm The World Bank (World Development Indicators database) -http://www.worldbank.org/ Exogeneous variables. I used as an exogeneous variables 387 macroeconomic factors with the coverage period from 1980 up to 2016 for 217 countries, basing on which the model has been reproduced. Concerning the open source data, should be taken into account the various density of data filling while transitioning from one factor to another. Furthermore, fullfilling of the most factors was very low, which makes the issue of their usage problematic. There are completely filled factors for the entire time period, but some of the data is partially filled. Endogeneous variables. As a response, I used the credit ratings of countries from the leading rating agencies S&P, Moody's and Fitch. We have received a history of ratings since 1949. It is worth mentioning that in the initial period of time the process of assigning ratings, covered a limited number of countries. Only since 1990, the sample of ratings has become sufficiently representative to construct statistical classification models.
Conversion of response into a numerical scale. To quantify the data of the three rating agencies, to classify the classes, and to obtain the possibility of applying the models of classification with the models of generalized regression, the response was digitized.
We have converted the Standard & Poor rating scale (ranging from AAA to SD) into a numerical scale (ranging from 0 to 100) and shall liberally refer to both of them as S&P ratings. Table 1 lists the different country risk levels or labels used by Standard & Poor, Moody's and Fitch and provides also descriptions associated with these labels. Countries which are assigned a label inferior to BB+ are considered as non-investment grade (speculative) countries. Countries rated CCC+ or lower are regarded as presenting serious default risks. BB indicates the least degree of speculation and CC the highest. Ratings labeled from AA to CCC can be modified by the addition of a plus or minus sign to show relative standing within the major rating categories. We consider such subcategories as separate ratings in our analysis.
Data conversion. Using the Matlab software environment, we converted the data into the fol- Missing data. The vital criterion is the availability of complete and reliable statistics. We eager to avoid difficulties related to missing ratings data that could reduce the statistical significance and the scope of our analysis. Further, we exclude all the rows from the training sample, in which the values of the ratings are not specified. Eventually, the volume of the training sample was reduced to 2395 points. A vast number of variables have a small coverage, hence we exclude those factors, in which the percentage of a missing data exceeds 40. Table 7 in the Appendix lists the factors' coverage.
Correlation analysis. We conducted a correlation analysis between the 164 factors and the response, in order to select the factors that will be used in further analysis in models' constuctioning. This correlation coefficient was calculated taking into account the presence of gaps in these factors. For this purpose, a special program MATLAB was used. Next, we sorted the factors, in descending order, by reducing the absolute value of the correlation of factors.
Factors' selection for the model constructioning. The criterion is the significance of variables for estimating a country's creditworthiness. We have performed an extensive literature review which played an important role in defining the set of candidate variables for inclusion in our model. Based on the correlation matrix, economic interpretation of factors and our representations, we selected a certain number of factors for models' constructioning. It worth mentioning that within the process of factors' selection, we have omited those factors that had a close economic sense and, correspondingly, were too correlated with each other. Thus, we tried to avoid the problem associated with multicollinearity. The clarification of the factors, participated in the model constructioning are listed below. Furthermore, the correlation between factors and response has been clarified. Table 2 represents the correlation matrix between the factors and response which have been selected.
The interpretation of the correlation matrix coefficients. For the purpose of analyzing the correlation matrix, we introduce the notion of a coupling force. It is generally accepted that the strength of the correlation coefficient, as one indicator of a boundary measure, is differentiated into three levels for both positive and negative correlations. First and foremost, we will commence with analyzing the correlation coefficiencts between predictiors and response. As we can see from the correlation matrix, generally there are positive correlation coefficients as well as negative ones. According to the notion of the coupling force, we will determine the following factors as ones with the positive strong coupling force: Adjusted net national income per capita (constant 2010 US$), Household final consumption expenditure per capita (constant 2010 US$), GDP per capita (constant 2010 US$). To make the analysis clearer and concise, we will clarify the correlation of each factor in the way that is more detailed.
If to consider the further factor, Adjusted net national income per capita (constant 2010 US$), with the most strong correlation of 0.826, we may conclude, that with the slight increase of Adjusted net national income per capita, the S&P rating increases considerably. To make it easy for understanding, we will clarify that with the increase of net national income per capita, the consumption is increasing, the savings are also increasing, and therefore the economy of the country is boosted. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases. Therefore, the rating agencies assign more enhanced rating to this country.
Moving on to the Household final consumption expenditure per capita (constant 2010 US$), with the correlation of 0.802, we may see the similar tendency. With the growth of this indicaor, the assigned rating enhances in the same way. To make it clear, we will build the following bounderies. The Household final consumption expenditure increases, which stimulate the production, therefore the demand for goods and services is growing and the whole economy is flourishing. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases, which ultimately leads for country rating boost.
If to cover the last indicator in that group, namely, GDP per capita (constant 2010 US$), it is obvious, that its correlation with the response is relatively less in comparison with the above mention pair of factors, but still strong -0.781. To make it more clear, with an increase of GDP per capita, both the consumption, the demand, the savings, the living standards are expanding. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases. Therefore, the rating agencies assign more enhanced rating to this country. Moreover, among selected facrors, we may refer the ones to the group with the positive moderate coupling force: Adjusted savings: consumption of fixed capital (% of GNI), Urban population (% of total), Exports of goods and services (constant 2010 US$), Broad money to total reserves ratio, General government final consumption expenditure (% of GDP).
The first one, exactly, Adjusted savings: consumption of fixed capital (% of GNI), has the strongest correlation between the resonse within this group of 0.515. Consequently, we may resume, that with the increase of this indicator, the S&P rating as well increases, but in more stepless way. In order to clarify, we will build up the whole chain. When the level of savings increases, hence the consumption (demand) may decrease, as well as the production and GDP. Ultimately, the country rating is less likely to increase.
The next one, in particular, Urban population (% of total), has slightly less correlation of 0.470 in compatison with the previous one. Sequently, we may affirm that with the increase of this indicator, the rating will slightly increase as well. To make it more clear, it should be emphasized that the more urban population, the less rural one. If accept this phenomena, then we may see the logic that the overall country economy will be transferred from the agricultural to the industial type. In other words, industrialization may occur, which in turn, is going to increase the production capacity, level of consumption, living standards, volume of savings and ultimately GDP. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases. Therefore, the rating agencies assign more enhanced rating to this country.
The following factor, Exports of goods and services (constant 2010 US$), has the correlation of 0.446, which relatively the same as the previous one. If to clarify, with the expand of Exports of goods and services in the country, the rating will excersize minor change. To make it more clearly, we will clarify the overall chain. With the increase of Exports of goods and services, the production capacity increases, hence, the unemployment is decreasing, the living standards are enhancing and lastly the GDP is increasing. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases. Therefore, the rating agencies assign more enhanced rating to this country.
The last two factors from this group have approximetly the same correlation coefficients. The first one, Broad money to total reserves ratio, has relatively small correlation of 0.380. It means that with the small increase of this coefficient, the rating is less likely to be altered. To make it more clear and concise, we may clarify, that with the increase of this ratio the volume of the broad money in expanding, bringing about the inflation growth. Therefore, in response latter, the prices are going to be increased. Hence, the demand will go inverse, followed by the shrinkage of supply. Ultametly, the overall economy development is going to be imprompted, causing the ratings to be less likely enhanced.
The one and the last from this group, namely, General government final consumption expenditure (% of GDP), has the correlation of 0.379. Consequently, we may resume, that with the increase of this indicator, the S&P rating as well increases, but in more steeples way. In order to clarify, we will build up the whole chain of clarification. Due to an increase of this indicator, we may state that the general demand will undergo the growthing phase. Thus, it will stipulate the increase of supply, which in turn will lead to the expansion of the sector's of economy, ending up the economy development. Hence, the country becomes more reliable to invest, as the probability of debt repayment i. e. meeting the obligations increases. Therefore, the rating agencies assign more enhanced rating to this country.
The ultimate group of factors, which ought to be mentioned has the negative moderate coupling force. Referring to the first one, Household final consumption expenditure, etc. (% of GDP), the correlation of which is standed for -0.563. To be more consize, it means that with the increase of this indicator, the country rating will be diminished.
To make it clearly for understanding, if the consumption of the housholds is going to be increased, hence the less funds will be left for savings (putting money under deposits). Therefore, due to the shortage of the deposits, the banks will not have sufficient funds to make credit operations. Therefore, they will fall back on increasing interest rates (in the case of the regection of credit from the Central Bank), which will make the acsess for the funds more sophisticated. Ultimately, the volume of investments will be decreased harshly, slowing down the growth of the GDP. Thefore, the rating is less likely to be altered.
And, the ultimate factor, Lending interest rate (%), which has the correlation of -0.450, means that with the increase of the Lending interest rate, the country rating is to be deteriorated. In other words, the increase of the key interest rate is accompanied with the tight monetary policy which is proclaimed by the Central Bank of country. Therefore, the level of cash distribution throughout the economic agents and the economy in the whole is significantly falling down. In turn, this phenopena gives a rise to shrinkage of the volume of investment as within the country as well as from outside it.
It is worth mentioning, that according to the yield curve, increased yield is giving rise to the risk exposure, which is similar to our context with the interest rates. Ultimatly, the economy development is deterorating, country risks are going to rocket and in that case the probability of increased country rating is going down.
As we have already described the correlation between predictor and response, its time to analyze the correlation between factors as well.
We will give a rise to such notion as multicollinearity. It means that there is any mutual strong interconnection between facrots within the sample. It is worth mentioning that if it exists, therefore it deteriorates the sample with the additional boundaries between factors. Ideally, the correlation between facrors should be tended to vanish (seek a null position). In other words, they ought to be statistically independent.
If you analyze our sample, it should be clearly seen, that there is a high multicollinearity between the following factors: 1. Adjusted net national income per capita (constant 2010 US$) and Household final consumption expenditure per capita (constant 2010 US$).
The multicollinearity between these 3 pairs of facrors is almost 1, which considerable dete-riorates our sample. Conducting the correlation alalysis of the data sample, we may conclude that the majority of factors have a high correlation with the response value. The highest value is presented by Adjusted net national income per capita (constant 2010 US$) and the least by General government final consumption expenditure (% of GDP) respectively.
Moreover, it can be clearly seen, that there is a high Multicollinearity (almost 1) between the following factors: 1. Adjusted net national income per capita ( Summing up the carried out dicriptive statistics, namely, Group Means and Confidence Intervals, Discriptive Statistics for the group of factors, and Correlation matrix, we can colclude that: On the one hand, there is the variety of advantages over the data sample, which are the following: 1. First and foremost, the majority of factors have reletevely high correlation coefficiencts with the response value.
2. Moreover, across almost all factors the intence multicollinearity is not observed.
The analysis of the relationship between factors and rating based on mean group values and confidence intervals. Eventually, with the help of conducting the descriptive statistics concerning Group Means and Confidence Intervals, we may conclude that, almost all factors have a positive correlation with the response except Household final consumption expenditure, etc. (% of GDP) and Lending interest rate (%).
In addition, due to the corresponding of one value of the factor to various confidence intervals, countries can be assigned a different rating. Thus, we may conclude that for the factor levels corresponding to different confidence intervals, the value of the rating is adjusted by other factors.
It is obvious that low rating values are prescribed by a narrow confidence interval, due to a small spread of factor values between the minimum and maximum values. Meanwhile, as the rating increases, the spread of factor values rises as well, which leads to wide confidence intervals.
By no means unimportant is that due to the peculiarity of the data sample, for a high rating values, namely from 80 to 90, there is a low number of observations for this factor, which leads to fluctuations for both the mean value and confidence interval.
The summary of carrying out the discriptive statistics for the group of factors. Conducting the descriptive analysis for the group of factors, we may conclude that almost all factors has multi- modal distribution (has two modes). Consequently, we may conclude that it is far from the normal distribution. Besides, the sample has right-sided asymmetry, as evidenced by a positive Skewness. In other words, the mean value and the median one is greater than the mode value. This suggests that our data is shifted to the left, relative to the normal distribution. Consequently, there is an abnormality in the distribution of the values of factors.
It should not go unspoken about the fact, there is an excess in the data sample. Fot the major part of data sample, the peak values are higher than the normal distribution. This is confirmed by a value of Kurtosis more then 3 whereas the normal value is 3.
However, the following factors are differ from the data sample: Household final consumption expenditure, etc. (% of GDP), Adjusted savings: consumption of fixed capital (% of GNI), Urban population (% of total).
The distribution is unimodal for this group of factors (has one mode). Consequently, we may conclude that it is close to the normal distribution.
However, the sample has left-sided asymmetry, as evidenced by a negative value of Skewness. In other words, in comparison with previous factors, the mean value and the median one is not much but still greater than the mode value. This suggests that our data is slightly shifted to the right, relative to the normal distribution. Consequently, there is a normality in the distribution of the values of factors.
Furthermore, there is an excess in the data sample. The peak values are less then normal distribution. This is confirmed by a value of kurtosis less then 3, whereas the normal value is 3.
Conducting the tests on normality. Based on our assumption of non-parametricity of data, we are going to test our data on normality, using the number of tests. The normality assumption is at the core of a majority of standard statistical procedures, and it is important to be able to test this assumption. In addition, showing that a sample does not come from a normally distributed population is sometimes of importance per se. Among the procedures used to test this assumption, one of the most well known is a Kolomogorov-Smirnov test.
First, we verify the null hypothesis concerning the normality of distribution using The Kolmogo-rov-Smirnov test. In Matlab application, we used the ks-test function. It returns a test decision for the null hypothesis that the data in vector x comes from a standard normal distribution, against the alternative that it does not come from such a distribution, using the one-sample Kolmogorov-Smirnov test. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise.
The findings of the Kolmogorov-Smirnov test showed that throughtout the sample, all factors reject null hypothesis. Therefore, we may conclude that there is an absence of normality and our sample has non-normal distribuion.
Moreover, we conducted the additional test to verify the null hypothesis concerning the normality of distribution using the Anderson-Darling test. In Matlab application, we used the ad-test function. It returns a test decision for the null hypothesis that the data in vector x is from a population with a normal distribution, using the Anderson-Darling test. The alternative hypothesis is that x is not from a population with a normal distribution. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise.
The findings of the Anderson-Darling test showed the same results as Kolmogorov-Smirnov. To be more clear and concise, it confirmed that throughtout the sample, all factors reject null hypothesis. Therefore, we may conclude that there is an absence of normality and our sample has non-normal distribuion.
Construction and verification of models of classification. In our scientific paper, we have been using both the special application, namely Matlab Toolbox, and writining the code by ourself. Let us consider the Matlab Toolbox in more detail.
Construction of models, including missing data. After selecting and preparing a database of factors and responses, the first model was created using the "Classification Learner". The latter allows solving classification problems and building models in an interactive mode.
Using this approach, the training sample size is 2395 points.
Method 1. Classification and regression trees. The methods of classification and regression trees showed the following results: The accuracy of various types of classification and regression trees is shown in the table 4.
Based on the accuracy analysis of the methods, we can conclude that Bagged Trees showed the highest accuracy of 51.5%. Therefore, we take this type of classification and regression trees as a basis in subsequent analyzes and model constructions.
After matrix analyzing, we can conclude that this model is prone to overvaluation of the rating, if to be concrete, it overestimates the set of points and assigns them a ration equal to 100 or AAA. In addition, we see that for the critical cases, namely the countries of bankruptcy or close to them, the model does not determine it (0,10,15,20).
Method 2. Discriminant Analysis. The methods of discriminant alalysis showed the following results: Proceeding from the table 5, we will draw the following conclusions: 1. Linear discriminant analysis showed a classification accuracy of 23.6%, which indicates about significant classification problems.
2. Unfortunately, quadratic discriminant analysis failded due to degeneracy of the covariance matrix which is caused by the strong collinearity between factors.
3. Subspace discriminant showed the best accuracy among other types of discriminant analysis at 28.5%, but this accuracy is too small for classification.
After matrix analyzing, we can conclude that this model is prone to overvaluation of the rating, if to be concrete, it overestimates the set of points and assigns them a ration equal to 100 or AAA. In addition, we see that for the critical cases, namely, the bankrupt countries or close to them, the model does not determine it (0,10,15,20). A very strong deviation from the true value is also noticeable, which indicates a poor quality of classification. The general conclusion: discriminant analysis gives an unsatisfactory result.
Method 3. Support Vector Machine (SVM). The methods of SVM showed the following results: Proceeding from the table 6, we will draw the following conclusions: 1. Based on the accuracy analysis of the methods, we can conclude that Cubic SVM showed the highest accuracy of 28.5%. Therefore, we take this type of SVM as a basis in subsequent analyzes and model constructions.
2. After matrix analyzing, we can conclude that this model is prone to undervaluation of the rating, if to be concrete, it underestimates the set of points and assigns them a ration equal to 0 (D). RUSBoosted Trees 26.0 Table 5 The accuracy of various types of discriminant analysis

Type of Discriminant Analysis Accuracy (%)
Linear discriminant 24.8

Quadratic Discriminant Failed
Subspase discriminant 28.5 Coarse Gaussian SVM 16.8 Table 7 The accuracy of various types of k-NN

Type of k-NN Accuracy (%)
Fine k-NN 41.7 Medium k-NN 33.9 Coarse k-NN 25.6 Cosine k-NN 33.4 Cubic k-NN 33.6 Weighted k-NN 40.9 Subspace k-NN 24.8 3. A very strong deviation from the mean value is also noticeable, which indicates a poor quality of classification.
The general conclusion: SVM gives an unsatisfactory result.
Method 4. k-nearest neighbors (k-NN) The methods of k-NN showed the following results.
Proceeding from the table 7, we will draw the following conclusions: 1. Based on the accuracy analysis of the methods, we can conclude that Fine k-NN showed the highest accuracy of 41.7%. Therefore, we take this type of k-NN as a basis in subsequent analyzes and model constructions.
2. After matrix analyzing, we can conclude that this model is prone to overvaluation of the rating, if to be concrete, it overestimates the set of points and assigns them a ration equal to 100 or AAA. In addition, we see that for the critical cases, namely the countries of bankruptcy or close to them, the model does not determine it (0,10,15,20).
3. A very strong deviation from the true value is also noticeable, which indicates a poor quality of classification.
The general conclusion: the model sufficiently deviates from the true values, but the accuracy estimates are at least better than in the Discriminant Analysis and the Support Vector Mashine Method. As a result, it can be concluded that the majority of builded models with presence of missing data (NAN) in the original sample show insufficient accuracy; hence, this methodology cannot be applied.
However, it can be clearly seen that the method of classification and regression trees, as well as the method of the nearest neighbor, showed quite good results. Consequently, using the following approach, we will choose only the following methods, specifically, ones with the highest accuracy: 1. Classification and Regression Tree: Bagged Trees.

Building Models with Exclusion of the Missing Data
When using this methodology, we exclude all points in which missing data exists at least in one factor. As a result, our training sample shrinked from 2395 to 1042 points. On the one hand, we have got rid of missing data, which positively influences the construction of the models, but on the other, we lost a certain amount of information. Method 1. Classification and regression trees: Bagged Trees. The results of building the model were shown as a Confusion Matrix Graph. If to compare with the previous approach, we can see a considerable improvement, particularly, the accuracy of the classification has increased from 51.5% to 70.5%. Also when analyzing the matrix, we see a significant enhancement in the accuracy of the classification. The points deviate much less from the true value. However, the model has a significant drawback. It was impossible to classify properly critical countries with low ratings (bankrupt).
Method 2. Discriminant Analysis: Subspase Discriminant. The results of building the model were shown as a Confusion Matrix Graph. If to compare with the previous approach, we see that the accuracy of the model has improved, but not considerably. The accuracy of classification has increased from 28.5% to 31.5%. When analyzing the matrix, we did not notice significant improvements in comparison with previously built models. Also as with classification and regression trees, the model has a significant drawback. It was impossible to classify properly critical countries with low ratings (bankrupt).
Method 3. Support Vector Mashine: Cubic SVM. The results of building the model were shown as a Confusion Matrix Graph. If we compare it with the previous approach, we can see a considerable improvement, particularly, the accuracy of the classification has increased from 28.5% to 68.2%. Also, when analyzing the matrix, we see a significant enhancement in the accuracy of the classification. The points deviate much less from the true value. However, the model has a significant drawback. It was impossible to classify properly critical countries with low ratings (bankrupt).
Method 4. k-nearest neighbors: Fine k-NN. The results of building the model were shown as a Confusion Matrix Graph. If to compare with the previous approach, we can see a considerable improvement, particularly, the accuracy of the classification has increased from 41.7% до 70.2%. Also, when analyzing the matrix, we see a significant enhancement in the accuracy of the classification.
The points deviate much less from the true value. However, the model has a significant drawback. It was impossible to classify properly critical countries with low ratings (bankrupt).
After conducting this approach, it can be concluded that all models have enhanced their results. The greatest increase in accuracy was obtained by the Bagged Trees, Cubic SVM and Fine KNN. Meanwhile Subspace Discriminant has not shown significant improvements.
Analysis of uniformity of distribution of sampling points by classes. It can be concluded that almost all models have enhanced their accuracy, but unfortunatuly, all of them unable to classify properly critical countries, ones with low ratings (bankrupt), which are vital for us.
We are convinced that this was due to the uneven presence of countries with various ratings within the training sample, specifically: the moments when countries are assigned low ratings are relatively rare, meanwhile for countries with the highest rating, almost all points are generally known for all factors. Consequently, points with high ratings dominate in our training sample.
Below is the histogram of the distribution of our sample by classes.
The histogram shows that in classes that correspond to countries with a low rating (from 0 to 25), there is a very small number of points, whereas in the class that corresponds to the highest rating (100), there is a prevailing number of points. The highlighted fact strongly distorts the construction of the model.
The method of grouping and converting data in manual mode. Applying this method, we eliminated 30 points from the group with the highest rating. Moreover, we also combined groups with a low rating in the range (from 0 to 25) to one group, and set its value to 10.
The histogram clearly shows that the frequency of low-rated countries has increased significantly. Also, the quantity of points in the class with the highest rating was reduced up to 150.
Method 1. Classification and Regression Trees: Bagged Trees. The use of grouping and data conversion approach did not effect significantly on the accuracy improvement in the model. The accuracy of the model increased from 70.5% to 72%. However, the graph shows that the amalga-   mation of groups with a low rating (from 0 to 25), and a reduction in sampling points for countries with high ratings, had a positive effect on the model, specifically decreasing deviations from the true value. The model commenced assigning values to the class of countries with poor rating.
In addition, the model has become less inclined to overestimate the values of the groups of countries with the highest rating. Method 2. Discriminant Analysis: Quadratic Discriminant. The results of building the model are shown below as a Confusion Matrix Graph. Using this approach, the method of discriminant analysis has been changed from Subspace Discriminant on Quadratic Discriminant. This has led to significant improvement in the accuracy of the model, namely from 31.5% to 51.2%. However, the graph shows that the amalgamation of groups with a low rating (from 0 to 25), and a reduction in sampling points for countries with high ratings, had a positive effect on the model, specifically decreasing deviations from the true value. The model commenced assigning values to the class of countries with poor rating. In addition, the model has become less inclined to overestimate the values of the groups of countries with the highest rating.
Method 3. Support Vector Mashine: Cubic SVM. The results of building the model were shown below as a Confusion Matrix Graph. The use of grouping and data conversion approach did not effect significantly on the accuracy improvement in the model, but on the contrary to its deterioration. The accuracy of the model decreased from 68.2% to 66.1%. However, the graph shows that the amalgamation of groups with a low rating (from 0 to 25), and a reduction in sampling points for countries with high ratings, had a positive effect on the model, specifically decreasing deviations from the true value. The model commenced assigning values to the class of countries with poor rating. In addition, the model has become less inclined to overestimate the values of the groups of countries with the highest rating.
Method 4. k-nearest neighbors: Fine k-NN. The results of building the model are shown below as a Confusion Matrix Graph. The use of grouping and data conversion approach did not effect significantly on the accuracy improvement in the model. The accuracy of the model has grown from 70.2% to 72.4%. However, the graph shows that the amalgamation of groups with a low rating (from 0 to 25), and a reduction in sampling points for countries with high ratings, had a positive effect on the model, specifically decreasing deviations from the true value. The model commenced assigning values to the class of countries with poor rating. In addition, the model has become less inclined to overestimate the values of the groups of countries with the highest rating.
Ultimetaly, it should be concluded, that the use of grouping and data conversion approach did not effect significantly on the accuracy improvement throughout all models (Table 9). Approximetely, in average, the accuracy has been increased by 3%. However, the amalgamation of groups with a low rating (from 0 to 25), and a reduction in sampling points for countries with high ratings, had a positive effect on the models, specifically decreasing deviations from the true value. Models commenced assigning values to the class of countries with poor rating. In addition, the models has become less inclined to overestimate the values of the groups of countries with the highest rating.
Logarithmic scaling of factors. Having carried out a logarithmic scaling of factors, it can be concluded that this method did not enhance the accuracy of the model. Dynamics of factors with the account of the previous year. So far, we have used crosssectional models with data. In other words, the factors'data were fixed for the year of rating. Thus, we built static models. The following step in our studies was to analyze the impact of changes in factors over time on the country's rating.
For this purpose, we amalgamate the data of the current year with the data of the previous one, for the selected factors. As a result, each factor will represent a model with two vectors, the values of the current and previous year.
Moreover, we refused to use Matlab Toolbox due to some limitations and lack of functions. In our vision, Matlab Toolbox is limited in terms of determining the quality criteria of the model. In other words, Matlab Toolbox allows you to determine only the criterion of classification accuracy.
However, in the future we are going to conduct a thin comparison of the selected models.
For this purpose, we will need those criteria that will evaluate classification errors, for example Mean Absolute Error (MAE) and Mean Squared Error (MSE).
As a result, a special program was created, the purpose of which was to carry out a thin comparison of models using various criteria for both the accuracy of the classification and additional criteria MAE (Mean Absolute Error) and MSE (Mean Squared Error).

Method 1. Classification and Regression
Trees: Bagged Trees. The results of building the model were shown in figure 6 as a Confusion Matrix Graph.
This approach, namely, taking into account the previous year, significantly improved the accuracy of the model. The accuracy of classification has increased from 72% to 75.9%. Also    when analyzing the matrix, we see a significant enhancement in the deviations. The points deviate much less from the true value. The absolute error was 1.8666, which does not exceed the conventional boundary between classes, equal to 5. This approach failed to improve significantly the accuracy of the classification, which slightly increased from 66.1% to 67.1%. Also, when analyzing the matrix, we did not notice any obvious enhancement with respect to deviations. The absolute error was 8.6970, which indicates a significant deviation from the true value for an erroneous classification. This approach, namely, taking into account the previous year, significantly improved the accuracy of the model. The accuracy of classification has increased from 72.4% to 77.77%. Also when analyzing the matrix, we see a significant enhancement in the deviations. The points deviate much less from the true value. The absolute error was 1.7839, which does not exceed the conventional boundary between classes, equal to 5. Using the methodology with the account of the previous year, we may conclude, that generally, the models has been enhanced significantly.
Particularly, the following models showed perfect response:

Bagged Trees
The accuracy of the classification has increased significantly, namely by 6%. And moreover, using our own algoritm, we are aware of the value of mean absolute error, which equals 1,86. Therefore, we may con-clude that the deviation is too low, bacause it does not exceed the conventional boundary between classes, equal to 5.

Fine k-NN
The next and one model, which also showed perfect results is Fine k-NN. The accuracy of the classification has increased significantly, approximately by 6%. Moreover, using our own algoritm, we become aware of the value of mean absolute error, which equals 1.78. Therefore, we may conclude that the deviation is too low, because it does not exceed the conventional boundary between classes, equal to 5. Unfortunalely, Quadratic Discriminant is failed this test.

Cubic SVM
Cubic SVM has also enhanced its results, but insignificantly. The accuracy of the classification has increased only by 1%, which can claim that this method does not fit to this approach. Furthermore, the value of mean absolute error is too high, particularly, 8.69, which exceeds the conventional boundary between classes, equal to 5.

Enhancing of k-NN methodology
In order to improve the accuracy of the classification, besides the Euclidean metrics, we verified, and other ones. As a result, the metric