Espacios. Vol. 35 (Nº 6) Año 2014. Pág. 4
Formality or informality: a choice based on individual characteristics
Formalidade ou informalidade: a escolha com base em características individuais
Recibido: 21/03/14 • Aprobado: 05/05/14
Informality is an important economic phenomenon to be investigated because it has several effects on the economy as a whole. Informality creates problems in raising funds by the state, as taxes are not collected in this type of work. The large presence of this type of work also creates problems of access to credit by businesses. For example, Dabla-Norris, Koeda (2008) found evidence that informality is robustly and significantly associated with lower access to and use of bank credit and a higher dependence on informal origins of financing.
The model  of this paper is an economy with two types of economic agents: firms and workers. Government is considered an endogenous agent. Firms are heterogeneous in their managerial ability. They produce formally or informally depending on profit maximization and personal background characteristics:
F= f (πF , πI , X)
where F is the choice of entrepreneur between formal and informal sector, πF is the profit of being formal entrepreneur, πI is the profit of being informal entrepreneur and X are individual background characteristics.
πF = Pf (a,l) – wf l(1+t) – T ,
where P is the price of the good produced by the firm, f (a,l) is the production function (with inputs a: managerial skills and l: units of labour), wf l(1+t) is wage per unit of labour paid to workers in the formal sector (including taxes) and T is fixed cost incurred by firms that operate formally.
πI = (Pf (a,l) - wil ) (1-q)
where wi is the wage per unit of labour paid to workers in the informal sector and q is the probability that a firm is caught operating informally.
F= f (UF , UI , X)
where F is the choice of entrepreneur between formal and informal sector, UF is the utility of being formal worker, UI is the utility of being informal worker and X are individual background characteristics.
UF = wF l + B – g
where B are government benefits related to work formally and g are fixed cost of working in the formal sector.
UI = wi l (1 - q)
The presence of q is due to the fact that workers do not receive their payment when an informal firm is detected.
Many papers focused their attention to the relation of profits and utilities of formal and informal sector. That is, the choice of entrepreneurs and workers between formal and informal sector is predominantly determined by the fact of obtaining a larger profit or utility in one sector respect the other.
For example, De Soto (1989) pointed out that a heavy load of taxes, bribes, and bureaucratic issues reduce the incentives, profits and utilities, to produce and work in the formal sector, that is, in our model, he focused on the way by which t, T and g affected π and U.
This paper, instead, focuses on the way by which personal background characteristics (X) affect the entry of individuals in formal or informal sector, considering as given the factors linked to government and institutions.
The main data source used in this paper is PNAD (National Household Survey) a survey developed by Brazilian Institute of Geography and Statistics. PNAD investigates annually, permanently, general characteristics of the population: as education, labor, income and housing, and others with varying regularity. The year of survey used in this paper is 2012.
PNAD is developed from a complex sampling design: it adopts a stratified and conglomerate sampling design with one, two or three selection stages, depending on the stratum (Silva et. Al., 2002). In this paper, the problem of complex sampling plan of the survey is considered and resolved.
The data that we have extracted from this survey are about job condition of individuals and other characteristics: gender, age, migrant, years of schooling, skin color, child, sector of employment and living in urban area.
The variables gender, age and migrant were not adapted. That is, gender is equal to 1 if individual is female and equal to 0 if individual is male; in this case it is worth noting that, according to the research of Ramalho, Silveira Neto (2010), men are more likely to self-employment in the informal sector, while women recorded higher chances of inclusion in the informal salaried jobs.
The variable migrant indicates if the individual lived in another Brazilian federal state or in another country during his life. Being in the condition of migrant can mean being more incline to accept whatever work, also informal, or to start a new firm (formal or informal).
The variable child considers the fact of having a child under the age of 18 years and shows a greater urgency in finding a job by individuals that are father or mother. Through this division, we want to exhibit the fact of having or not having a family and so the need to work to maintain dependents in the family.
Years of schooling can be an important determinant in the choice of an individual between formal and informal sector. In their paper, Mello and Santos (2009) find that education levels are, at any instant of time, the main individual characteristics that determine the relevance of the workers of the two economic sectors considered (formal and informal). The conclusion is that the improvement in the distribution of education of the population is the real responsible for the increase in the degree of formalization of economies.
In our paper, education enters in the model in three ways: firstly through the profit functions, that is, improving managerial skills (a), only in the case of entrepreneurs; secondly, through salaries (w) since education improves human capital; finally through personal background characteristics (X), due to the fact that high schooling individuals have more knowledge of law, rules and ways for formalize themselves or theirs firms.
The data regarding skin color were divided into White and East-Asian people on one side and Black, Brown and Amerindian people on the other. This division is justified by the fact that, in Brazil, Black, Brown and Amerindian people have been always disadvantaged socially and have had less opportunities respect Whites and East-Asians.
Thus, we can expect a larger presence of Blacks, Browns and Amerindians in informal jobs and firms. Saboia, Saboia (2006) showed the most unfavorable situation of black / brown in relation to whites in the labor market in the country. In the population white workers receive about double the income of black / brown; when we consider only workers with degree, salary differential is 15%.
The data regarding sector of employment was adapted to reach a division into people that work or act in agriculture, in industry and in services. Belonging to sectors characterized by a high degree of informality (the sector of agriculture and services) or to a sector more intensive in formal jobs (industrial sector), can change the degree of formality in the market job. For instance, structural change in the sectorial composition explained 25% of the increase in the degree of informality observed throughout the 90s in Brazil (Ulyssea, 2006, 2010).
The data about region where individual lives are adjusted to identify, on one side, people that live in urban areas and, on the other side, people that live in rural areas. This distinction can be useful because type of informality can be different between urban and rural individuals.
Variables used in this work can be observed in table 1, which resumes also the values that each variable can take.
Table 1. Description of variables
Source: Own elaboration. Extracted from PNAD 2012
Following De Mel et.al. (2010) and Bruhn (2012), this paper classifies both groups of formal and informal individuals into wage worker and business owner species using discriminant analysis. As described in De Mel et.al., discriminant analysis is a tool used by other sciences like biology to separate element of nature into species based on measured characteristics.
For verifying if the variables chosen are relevant to separate the four groups through a discriminant analysis, firstly we analyze means and standard deviations of variables in each group derived through the variable formalinf.
Table 2 displays averages and standard deviations for the personal background characteristics, by occupation group. The statistics in table 2 show that women are more present in the general group of workers respect to the group of entrepreneur, in particular are largely employed in informal jobs that in formal jobs.
Another evidence that results from table 2 is the fact that informal workers are the group with an absolute lower age (34,28 years) follow by formal workers (35,91 years); two groups of entrepreneur present higher ages: formal entrepreneur 43,52 years and informal entrepreneur 44,07 years. This fact may be linked to life cycle, where individuals in the first stages of life are workers, while in the last stages become entrepreneurs, given the possession of greater experience and savings.
Regarding the variable migrant, it can be affirmed that this varies slightly among groups. On the contrary, the variable schoolingy is very different considering each group: informal workers and entrepreneurs have an average level of schooling inferior in comparison to formal workers and entrepreneurs; the biggest difference occurs between informal and formal entrepreneurs, where formals have nearly twice years of schooling than informals according to average figures. This difference could be explained through larger difficulties faced by low schooling entrepreneur in the effort for formalize their firms, given the complexity of laws, taxes and regulations. Otherwise, it could be explained by an effort of low schooling entrepreneur of being competitive through tax evasion, given the fact that he is not competitive through human capital.
Passing to the successive variable that is skincol, it can be noted that Blacks, Browns and Amerindians are largely present in informal sectors of economy on average, being 63% and 61% respectively in the group of workers and of entrepreneurs; while they are 50% in the group of formal workers.
When we consider the fact of having at least one child after year 1994, table 2 shows that formal and informal entrepreneur have less propensity to have at least one child on average that the groups of workers. Moreover, considering workers it can be note that informal ones have slightly more propensity than formal, while considering entrepreneurs it occurs the opposite.
Passing to the next variable that is agrindser, table 2 demonstrate that, on average, formal workers are more present in the sector of industry in comparison to informal workers, while formal entrepreneur are less active in the sector of industry if compared to informal entrepreneur.
Urban, the last variable indicates that in general urban individuals are more present in the formal groups than in the informal ones, with the lowest figure in the case of informal entrepreneurs.
Table 2. Personal background characteristics by occupation group:
Source: Own elaboration. Extracted from PNAD 2012.
For observing sign of each independent variable in classifying independent variable, it has been used a multinomial logistic regression through the software STATA. Multinomial logistic regression is a maximum likelihood model with discrete dependent variables, with dependent variable that takes more than two outcomes and the outcomes have no natural ordering.
In the multinomial logit model, a set of coefficients, β(1), β(2), β(3) and β(4) are estimated, corresponding to each outcome:
To obtain the multinomial logistic regression, some aspects relative to the data are considered. The first aspect considered is the fact that National Household Survey (PNAD) is developed from a complex sampling plan. Through STATA software the structure of sampling plan is firstly specified and after taken into account in the multinomial logistic regression.
The second aspect is the presence of a selected sample in this research.
In this paper, dependent variable (formalinf) intentionally can take values that are relative to employed people, while unemployed people are not taken into account and excluded from dependent variable. Corrective measures are taken through Mills inverse ratio. The variable of inverse Mills ratio is manually calculated through a probit regression where the dependent variable indicates if individual works or not and independent variables are child, skincol, schoolingy, gender, age and a further variable that evidences whether individual receives income from other sources than work. The variable of inverse Mills ratio is calculated as the ratio of the probability density function to the cumulative distribution function and consequently included in multinomial logistic regression.
Table 3. Multinomial logistic regression (coefficients).
Source: Own elaboration
Multinomial logistic regression shows that nearly all variable are statistically significant at 1% of significance; gender in the group of formal entrepreneurs is significant at 5%. Variable child in group 2 and 3, and schoolingy in group 3 are statistically insignificant. In these cases, the variables in question do not determine the choice of individuals to belong to a group.
Considering coefficient and in particular β(2), it can be affirmed that variables age, schoolingy and urban have a negative effect on belonging to the group of informal workers. That is, increasing the age of individual, raising the years of schooling and living in urban area lead to a reduced probability to belong to informal workers group.
In particular, living in urban area has a larger negative correlation than other two variables. All the other independent variables are positively associated to the probability to belong to informal workers group, excluding child, which is statistically insignificant. Specifically, being employed in the sector of agriculture or services (agrindser) and being a woman (gender) lead to a larger effect than other variables; in any case, all variables have an important effect. In this group, all the variables have the sign of coefficients as predicted by theory.
Passing to the group of formal entrepreneurs, being black, brown or Amerindian (skincol) and being a woman (gender) have a negative relation on belonging to this group. In particular, skincol has a large negative effect. On the contrary, variables age, migrant, agrindser and urban have a positive effect: an individual has more possibility to enter in this group if he is older, migrant, acts in the sectors of agriculture or services, and lives in urban area. In this case, significant results are in accordance with our expectations, except the sign of agrindser that is expected being negative. We can suppose that the effect of the variable agrindser on entrepreneurs is inverse respect on workers, differently than expected. Industrial sector could be more formalized in the recruitment of employees in comparison with agricultural and services sectors, but it could be less formalized considering the number of entrepreneurs that are registered in the National Register of Legal Entities.
Moreover, we have to evidence the fact that level of education is not statistically significant; that is, the choice of being a formal entrepreneur is not determined by level of education.
The last group to be considered is that of informal entrepreneurs. In this case, schoolingy, skincol, child, agrindser and urban have a negative effect on belonging to the group of informal entrepreneurs. Individuals that have studied more years, are black, brown or Amerindian, have at least one child born after year 1994, act in the sectors of agriculture or services and live in urban areas, have less possibilities to enter in the group of informal entrepreneurs. On the contrary, feminine gender, individuals with an older age and migrant are characteristics positively related with informal entrepreneur.
An interesting point of these results is that being black, brown and Amerindian (skincol) is negatively correlated with both groups of entrepreneurs, while age is positively correlated with both groups, although with a low value of the coefficient.
For investigating the groups according to the type of work or enterprise, it has been used canonical discriminant analysis through STATA software.
Canonical discriminant analysis is developed in this paper to obtain the relative importance of each variable in the explanation of informality.
Canonical discriminant analysis derives a linear combination of the variables that has the highest possible multiple correlation with the groups. This maximum multiple correlation is called the first canonical correlation. The coefficients of the linear combination are the canonical coefficients. The variable defined by the linear combination is the first canonical variable. The second canonical correlation is obtained by finding the linear combination uncorrelated with the first canonical variable that has the highest possible multiple correlation with the groups. The process of extracting canonical variables can be repeated until the number of canonical variables equals the number of original variables or the number of groups minus one, whichever is smaller. Thus, in this work, the variables will be three, due to the fact that the groups are four.
Discriminant analysis involves the determination of a linear equation that will forecast which group individual belongs. The form of the equation or function is:
D= v1 X1 + v2 X2 + v3 X3 + ..= vi Xi + a
where D = discriminate function
The objective of this function is maximizing the distance between groups, that is, resulting in an equation that has strong discriminatory power between groups.
Before utilizing canonical discriminant analysis we have to test the assumption of this type of analysis: sample size, normal distribution, homogeneity of variances / covariances, outliers and non-multicollinearity (Poulsen, French, n.a.).
The first assumption is sample size, that is, the sample size of the smallest group needs to exceed the number of predictor variables. In this paper, this assumption is largely accepted since the sample size of the smallest group is very large.
The assumption normal distribution refers to the fact that the data (for the variables) represent a sample from a multivariate normal distribution. To test this hypothesis we recur to Doornik-Hansen multivariate normality test, which does not reject the hypothesis of normality at 1% of significance.
Third hypothesis is related to homogeneity of variances/covariances. To test this assumption we recur to the test of equality of covariance matrices across the four groups. The test gives positive results and this assumption could not be rejected.
The assumption of outliers refers to the fact that discriminant analysis is highly sensitive to the inclusion of outliers. We check the presence of outliers through a test for univariate and multivariate outliers for each group, and we eliminate them.
The last assumption is relative to the fact that if one of the independent variables is very highly correlated with another, then the matrix will not have a unique discriminant solution. We check this hypothesis through a test of multicollinearity where VIFs (variance inflator factors) are found for each independent variable. All VIFs have values below 10 and thus we can conclude that multicollinearity is not present in the model.
Table 4. Canonical linear discriminant analysis
Source: Own elaboration
As seen in the canonical-correlation table (table 4), the first linear discriminant function accounts for almost 55% of the variance, the second accounts for almost 41% and so these variables cumulate approximately 96%. This paper will consider only the first function in the continuation of the analysis.
F test is used to test the null hypothesis that the covariance matrices do not differ between groups formed by the dependent variables, because the basic assumption is that the variance-co-variance matrices are equivalent. If the test is not significant, as in the case of three variables of this work, the null hypothesis that the groups do not differ can be retained.
Canonical correlation explains total correlation between the predictors and the discriminant function. More interesting is studying the partial correlation of each variable with the function through standardized canonical discriminant function coefficients.
Table 5. Standardized canonical discriminant function coefficients
Source: Own elaboration
Standardized canonical discriminant function coefficients, showed in table 5, indicate discriminating ability of each variable for these four groups.
D1= ( - 0,333 x gender) + (0,550 x age) + ( - 0,030 x migrant) + (0,821 x schoolingy) + ( - 0,227 x skincol) + (0,102 x child) + ( - 0,077x agrindser) + (0,199 x urban)
Table 5 also shows the ranking of variables according to their discriminant ability. Schoolingy, age and gender are the most important variables in discriminating individuals among the four groups. We observe that skincol and urban are the successive variables in order of importance, after schoolingy, age and gender.
Table 6. Group means on canonical variables
Source: Own elaboration
Table 6 shows means of each group according to canonical variables.
The group means on the canonical variables are shown, giving some indication of how the groups are separated. In this case, it can be observed that, in the first function, formal workers mean (1) is very distant from informal workers mean (2); the same thing occurs between formal entrepreneurs (3) and informal entrepreneurs (4).
The successive data (table A1 in Appendix) useful to analyze discriminant analysis in question is confusion matrix or resubstitution classification table. The resubstitution classification table indicates how many observations from each group are classified correctly or misclassified into the other groups. The upper value indicates the number of individuals, while the value below specifies number percent.
Considering table A1, the best classification occurs in the group 3 (formal entrepreneurs) and also group 2 (informal workers) obtains relatively good classification (50,72 %). On the other side, classification of group 1 (formal workers) is poor: in group 1 many individuals are misclassified in groups 2 and 3.
In conclusion, through canonical discriminant analysis, the four groups can be separated using independent variables chosen for this purpose.
The most important variables to divide the four groups are schoolingy, age and gender.
The study investigated the possible factors that determine the choice by an individual between formal and informal sector in Brazil considering the characteristics of individuals.
Multinomial regression anticipates the results of discriminant analysis, also providing the sign of correlation between characteristics and belonging to a group. Most of variables are statistically significant and have sign according to the initial assumptions
The most important characteristics are, in order, years of schooling, age, gender and skin color. Among these characteristics, we can underline firstly the importance of years of schooling, which is also evidenced in the literature about recent decreasing of Brazilian informality [Mello, Santos, 2009]. Investing in human capital leads to low incentive for individuals to be informal in job market or in entrepreneurialism. Individuals with a higher level of education can realize the advantages of being formal: for instance, participation in partnerships, class associations and unions, access to credit, not being under the risk of being confiscated and access to social welfare. Moreover, individual with more schooling can easily research and know how to formalize own firm or how to find a formal job in the market.
The variable age confirms to be a discriminant variable and appears to be a variable that describe more the life cycle of individuals, distinguishing between workers and entrepreneurs, than discriminating between formal and informal sectors.
Gender seems to have a role in discriminating between formal and informal sectors for both workers and entrepreneurs.
The role of the variable skin color is less clear: we can suppose that it differentiates individuals both at level of worker/entrepreneur and at level of formal/informal sector.
Finally, we have to remark the fact that this paper takes into account only individual characteristics to explain informality, considering given institutional characteristics, as level of taxation and public oversight of informality.
Table A1. Resubstitution classification summary
Source: Own elaboration
Almeida, R., Carneiro, P. (2009); "Enforcement of labor regulation and firm size". Journal of Comparative Economics, 37 (1), 28 – 46
Bruhn, M. (2011); "License to sell: The effect of business registration reform on entrepreneurial activity in Mexico". Review of Economics and Statistics 93 (1), 382–386
Bruhn, M. (2012); "A Tale of Two Species: Revisiting the Effect of Registration Reform on Informal Business Owners in México", Policy Research Working Paper n. 5971, World Bank
Burns, R., Burns, R. (2008); Business Research Methods and Statistics using SPSS, London: SAGE Publications
Cameron, A. C., Trivedi, P. K. (2005); Microeconometrics: Methods and Applications, Cambridge: Cambridge University Press
Dabla-Norris, E., Koeda, J. (2008); "Informality and Bank Credit: evidence from firm-level data", IMF working paper, Washington
De Mel, S., McKenzie, D., Woodruff C. (2010); "Who Are the Microenterprise Owners? Evidence from Sri Lanka" on Tokman v. de Soto.? in J. Lerner and A. Schoar (eds.) International Differences in Entrepreneurship, 63-87.
De Mel, S., McKenzie, D., Woodruff, C. (2012); "The demand for, and consequences of, formalization among informal firms in Sri Lanka". World Bank Policy Research Working Paper N.5991
De Soto, H., (1989); The Other Path: The Invisible Revolution in the Third World, Harper Row, New York
Fajnzylber, P., Maloney, W. F., Montes-Rojas, G. V. (2011); "Does formality improve micro-firm performance? Evidence from the Brazilian simples program". Journal of Development Economics 94 (2), 262 – 276
Galiani, S., Weinschelbaum, F. (2006); "Modeling Informality Formally: Households and Firms", Centro de Estudios Distributivos, Laborales y Sociales, Documento de Trabajo Nro. 47, Universidad Nacional de La Plata
Hsieh, C.-T., Klenow, P. J. (2009); "Misallocation and manufacturing tfp in China and India". Quarterly Journal of Economics 124 (4), 1403 – 1448.
Kaplan, D. S., Piedra, E., Seira, P. (2011); "Entry regulation and business start- ups: Evidence from Mexico".Journal of Public Economics 95 (11-12): 1501–1515
Mello, R.F., Santos, D.D. (2009); "Aceleração educacional e a queda recente da informalidade". IPEA, Boletim Mercado de Trabalho 39
Monteiro, J. C., Assuncao, J.J. (2012); "Coming out of the shadows? estimating the impact of bureaucracy simplification and tax cut on formality in Brazilian microenterprises". Journal of Development Economics 99, 105-115
Neri, M.C. (2007); "Informalidade". In: Tafner P, Giambiagi F, organizadores. Previdência no Brasil: debates, dilemas e escolhas. Rio de Janeiro: Ipea; p. 285-319
Poulsen, J., & French A. "Discriminant Function Analysis", Retrieved from:
Ramalho, H. M. B., Silveira Neto, R.M., (2010); "A importância do setor informal na migração rural-urbana: evidencias para o Brasil". Encontro Nacional ANPEC.
Saboia, A. L., Saboia, J. (2006); "Brancos, Pretos e Pardos no Mercado de Trabalho no Brasil Um Estudo sobre Desigualdades", Instituto de economia, UFRJ
Silva, P. L. do N., Pessoa, D. G. C., Lila, M. F., (2002); "Análise estatística de dados da PNAD: incorporando a estrutura do plano amostral", Ciência Saúde Coletiva, vol.7, no.4, 659-670
Ulyssea, G. L., (2006); "Informalidade no mercado de trabalho brasileiro: uma resenha da literatura", Revista de Economia Política, vol. 26, nº 4 (104), 596-618
Ulyssea, G. L. (2010); "Regulation of entry, labor market institutions and the informal sector". Journal of Development Economics 91, 87–99
Ulyssea, G. L. (2013); "Formal sector's entry costs, taxes, enforcement and
1PhD student in Development Economics (PPGDE) Federal University of Paraná (UFPR), Brazil email@example.com