LUIS MUELLE
Educational consultant
Abstract. This study applies a logistic multilevel analysis to the test results of Peruvian students in science, mathematics, and reading in the PISA 2015 round. It focuses on measuring the impact of student socioeconomic and contextual factors on low academic performance. The socioeconomic status of the students and the socioeconomic composition of the school appear to be the main factors that affect the poor performance. Other contextual factors, such as repetition, mother tongue, school size, grade, nontruancy and gender, are associated with low achievement. It is also worth highlighting the presence of noncontextual emotional factors that affect academic risk such as sense of belonging to a school, achievement motivation, and test anxiety.
Keywords: Programme for International Student Assessment; academic achievement; students; social conditions; Peru.
Acronyms and initials
AIC Akaike information criterion
BIC Bayesian information criterion
ESCS Economic, social, and cultural status of students
IDB InterAmerican Development Bank
IRT Item response theory
MINEDU Ministry of Education, Peru
OECD Organization for Economic Cooperation and Development
PISA Program for International Student Assessment
UMC Learning Quality Measurement Office (Oficina de Medición de la Calidad de los Aprendizajes), Ministry of Education
UMCISE Socioeconomic index of the Learning Quality Measurement Office
UNESCO United Nations Educational, Scientific and Cultural Organization
WLE Weighted likelihood estimate
Introduction
There is a long tradition of studies about the influence of students’ social status on their performance in assessments across different disciplines and grade levels. National samplebased and general assessments at different stages of primary and secondary school in Peru, as well as the country’s increased participation in international assessments, particularly PISA testing, have led to the accumulation of greater quantities of valuable data on the effectiveness and equity of the education system. Despite some progress—many of these assessments point to an expansion in the coverage of primary and secondary education in recent decades—test results are modest and social inequalities persist (Cuenca et al., 2017; Miranda, 2008).
Peru has participated in all four rounds of the international Program for International Student Assessment (PISA) since its inception in 2000. PISA evaluates the performance of 15yearold school students—that is, those who have completed compulsory education—in tests covering the three subjects regarded as basic: science, mathematics, and reading.2 The program is administered in threeyear cycles, with tests focusing, alternately, on one subject at a time. PISA 2015 placed emphasis on scientific skills.
PISA aims to guide education policies by relating student outcomes in cognitive tests with their socioeconomic and cultural context, while also taking into account attitudes and motivations. Thus, the program seeks to support education policies in the improvement of learning by identifying best practices and strategies in participating countries.
For Peru and other participants from the Latin American region, PISA 2015 results were modest without exception. Thus, for example, in science, Peru was ranked in the bottom third of countries (BIDCIMA, 2016), with stark performance differences equivalent to more than 2.5 years of schooling based on OECD criteria, which establishes that a difference between countries of 30 percentage points in the science score is equivalent to a year of schooling. The proportion of students who reached the Achievement Level 2, considered the baseline proficiency that students must attain, is also included as a criteria for comparison.
Table 1 presents the scores per subject and Level 2 percentage distribution in Latin American countries; the high percentage of students below the basic level of proficiency across the three subjects is striking. These figures are certainly low in comparison with the international OECD average of 21% in science, 46% in reading, and 63% in mathematics.
Table 1
Latin America in PISA 2015: performance averages and percentages of students who obtain minimum proficiency levels in science, mathematics, and reading
Score per subject and % <=level2 

Science 
% 
Mathematics 
% 
Reading 
% 

Chile 
447 
35 
423 
49 
459 
28 
Uruguay 
435 
41 
418 
52 
437 
39 
Costa Rica 
420 
46 
400 
62 
420 
40 
Colombia 
416 
49 
390 
66 
425 
43 
Mexico 
416 
48 
408 
57 
423 
42 
Brazil 
401 
57 
377 
70 
407 
51 
Peru 
397 
58 
387 
66 
398 
54 
Dom. Republic 
332 
86 
328 
91 
358 
72 
Source: compiled by the author, based on OECD (2016). Countries appear in descending order by scores in science.
This study draws heavily on the publicly available PISA 2015 database, which provides evidence on the factors that can influence academic achievement, as well as the vast collection of synthetic indices compiled from questionnaire responses. The study aims, on the one hand, to measure the effects of contextual factors and the variability between schools and students in each subject; and on the other, to analyze the noncontextual (or process) factors. Both objectives are explored using multilevel hierarchal and binary logistic regression methods of analysis, which identify the significance of the factors that are most closely related to performance and which present greater explanatory power.3
Various official reports and studies have addressed the factors associated with performance, enabling the accumulation of valuable knowledge on the topic. When it comes to Peru, however, there is littletono analysis of factors associated with the achievements of the subpopulation of students who perform poorly, or comparison with those who perform well. Another underexplored issue is the relationship that exists between these subpopulations and advantaged or disadvantaged socioeconomic status. Of course, such analysis requires definition of the concepts of high and low performance and advantaged and disadvantaged socioeconomic status. This will be explored below.
Since the time of Coleman (1966), the academic literature on factors associated with performance has constantly stressed the importance of the socioeconomic status of students’ families.
On the international level, some studies (Hanushek & Woessmann, 2008) have directly and indirectly explored some of these familyrelated factors. They do not identify a universal factor to explain differences in performance, but point to the multiple interactions with these factors during a student’s schooling. Sirin (2005), in his metaanalysis of 74 studies on the most influential factors for achievement, finds that socioeconomic status represents one of the strongest correlations in the group. This author notes that students with higher socioeconomic status typically obtain high scores in the tests, and are more likely to complete secondary school and go on to university than their counterparts of more humble origins.
In the Spanishlanguage literature, Cordero, Crespo, and Pedraja (2013) review PISA results for Spain between 2000 and 2009, observing that most studies identify student socioeconomic status and course repetition as the main determinants of achievement, while factors associated with school resources have very low explanatory power. More recently, Gamazo et al. (2018), analyzing PISA 2015 data, report that the contextual factors with the greatest effect are gender, ontime enrollment, the socioeconomic level of the school and the student, and course repetition. Meanwhile, they do not detect any significant relationships between noncontextual factors and schoollevel variables.
Focusing on student assessment in Peru, Agüero and Cueto (2004) argue that low performance levels are partially the result of peer effects within the classroom. The authors note the importance of these effects in designing policies aimed at boosting equality and quality, and propose the allocation of resources to improve performance. Another outstanding study of the Peruvian case is Cueto (2007), who focuses on the main antecedents, characteristics, and results of four national assessments and two international assessments that tackle the factors associated with performance in language and mathematics. He also discusses the challenges and opportunities related to the student performance evaluation system in Peru.
Carrasco (2007), in her investigation of Peruvian schools based on data from PISA 2000, finds that school socioeconomic level has a greater effect on student performance than does the socioeconomic level of the individual. She observes that a school’s sector (state/nonstate) does not explain performance once its socioeconomic composition is controlled for, and that school resources and equipment levels are also nonsignificant. In general, the vast majority of schoollevel variables are not statistically significant in explaining the variations in these PISA scores.
Guadalupe and Villanueva (2013), addressing the evolution of reading performance in the Latin American countries that took part in PISA 2000 and 2009, enquire into the extent to which the changes observed in these assessments can be explained by a transformation in students’ socioeconomic characteristics. They propose that socioeconomic status be measured using a different procedure that is more sensitive to the new social context.
Benavides, León and Etesse (2014) analyze the data from PISA 2000 and 2009 assessments of reading comprehension in Peru and argue that performance gaps visavis socioeconomic differences have increased over time, thereby aggravating the level of student segregation in schools. This effect is influenced by various sociodemographic factors, and may indicate that schools are becoming spaces of social segregation. For their part, León and Youn (2016), analyzing Peru’s mathematics results in PISA 2012, highlight the significant effects that the disciplinary climate in class and a sense of belonging have on gaps caused by social differences.
León and Collahua (2016), who examine the effect of socioeconomic level on the performance of Peruvian students over the past 15 years, note that current learning assessments tend to use synthetic indices that employ combinations of different indicators of familylevel economic, social and cultural dimensions. Their metaanalysis covers the period 20002014, and identifies 28 studies of education that relate family socioeconomic level with academic performance. The effect of school socioeconomic composition is as much as seven times greater than that of student socioeconomic level, which evidences its central importance. However, the authors warn that these measures are based on a single grade level/class at each school, and point to the need to develop new indicators at the school level to better identify levels of segregation. Furthermore, they highlight the importance of using multilevel hierarchal models to better estimate these effects.
These prior studies constitute valuable reference points for analysis of the factors that influence performance. In this sense, Murillo (2007) distinguishes between input factors (gender, socioeconomic level, first language, teaching resources, and teachers at the school); process factors (study habits, academic expectations, family support, school climate, teaching methodology); and the output factor (student achievement in the test). This approach is complemented by the theoretical framework proposed by the OECD (OECD, 2016, p. 41) for selecting variables within a model that relates students (social index, language, gender, schooling, location, school sector, and attitudes and behavior) with schools (social composition, leadership, educational resources, and teachers), among other dimensions.
For the purposes of this study, the following variables have been selected at the student level: socioeconomic status; gender; home language; geographical location; ontime enrollment; repetition; preschool attendance; and attitudes and behavior toward and at school. At the school level: socioeconomic composition; teaching resources; school climate; and teaching practices. At the education system level: leadership; school size; autonomy; school sector; and student selection and guidance.
Certain caveats are necessary. First, PISA obviously captures only some of the multiple factors considered important in relation to the results. Second, PISA does not assess the entire curriculum, focusing only on the three subjects selected by common agreement of the countries participating in the program. Third, the cognitive tests are accompanied by contextual surveys that cover only part of the complex family, cultural, and social environments of the student and the school.
It is also worth noting when interpreting the results that the PISA design does not include random or experimental assignments, and thus does not allow for the detection of causality. However, statistical association can indicate potential causal relationships. Finally, the average obtained in the tests refers to the national level, and can differ greatly depending upon the country’s administrative and geospatial characteristics.
In sum, the results evaluated in the tests are a product of the student’s entire schooling experience and are thus crosscutting. It should be understood that schooling is a historical product of current and former educational policies.
To achieve the stated aims, the database is made up of the PISA representative nationwide sample of 15yearold students enrolled in state and private secondary schools located in urban and rural areas. At each school, 35 students of the corresponding age and grade level were selected. The sample was made up of 6,971 students from 281 schools throughout the country. The scores obtained in the tests were standardized, with an average value of 500 points and a standard deviation of 100 points. To estimate student proficiency, the twoparameter psychometric model was used, from IRT. Moreover, questionnaires were administered to students, teachers and principals at the selected schools.
In the PISA reference framework, students respond to different item subsets in each of the measured subjects. This requires the plausible values methodology, which allows for estimations that are consistent with the characteristics of the population of students tested. These values, estimated through an imputation process, seek to represent a measure of each student’s proficiency. In PISA 2015, ten plausible values are assigned to the students in each of the subjects. As a result, all analysis of scores must necessarily take into account these ten values simultaneously. Their omission would cause serious biases in the standard errors and significance tests, which could give rise to erroneous conclusions. For an extensive discussion of these values, see Von Davier, Gonzales and Mislevy (2009). It should be noted that in this study, all calculations referring to scores obtained by students in each of the subjects take into account these ten values simultaneously.
Moreover, multilevel models are used for data analysis. These are mixed models, because they contain fixed and random effects. The fixed effects are akin to standardized regression coefficients and are estimated directly. The random effects are obtained through the estimated variances and covariances. These random effects can take the form of random intercepts or coefficients in a grouped data structure that can consist of multiple levels of nested groups. Such mixed models are also known in the literature as multilevel or hierarchical models.
There are two main reasons for using the multilevel option. First, students attend classes at the same school and thus constitute a “cluster” in which relationships exist between classmates who share the same physical space and teachers. In this regard, using standard regression tends to bias the standard errors by erroneously assuming that the observations are independent, a basic principle of linear regression. Thus, in the educational field, multilevel models are used for their ability to incorporate the hierarchical nature of data (Raudenbush & Bryk, 2002). Second, multilevel models provide an estimation of the patterns in the variations that occur within and between institutions simultaneously. These models measure the extent to which performances reflect the differences in the effects of the context in which schools operate, and the differences arising from variations in students’ family and personal characteristics. In the models, multilevel mixedeffect linear regression is applied in the case of a continuous dependent variable, while multilevel mixedeffect logistic regression is used in the case of a binary variable.
In this study, the multilevel model parameters are estimated in all cases using the weights of the students and the schools included in the database. As is normal in mixed models, the student weights are resized by dividing them by the averages of their corresponding cluster, which in this case is the school they attend (RabeHesketh & Skrondal, 2012).
Estimating relationships, particularly causal ones, poses a fundamental challenge for the researcher since they require controlled experiments. However, such experiments are generally impossible to perform in the social sciences and in education. In the absence of experimental data, models are constructed and proposed with the aim of capturing the probable connections between covariates whose characteristics are assumed to be associated with the dependent variable. These models are successful if the covariates allow the associations to be explained with a certain degree of significance.
It should be noted that certain school characteristics that are unobservable and positively associated with the slopes of the residuals of the random variables act on achievement. These covariates, which are correlated with the terms of error, are known in econometrics as endogenous, and produce results in which the relationships cannot be interpreted. In a framework of multilevel mixed binary regression, as in this study, it is evident that no single universal model exists. Both in the construction of the socioeconomic index and the selection of variables in any model intended to find association with student performance, there is a degree of endogeneity between some covariates, particularly when variables between schools are compared.4 Admitting this probability of endogeneity bias comes from the proposal of causality between the variables and opens up important research questions.
In Peru, low performance extends to 46.7% across all subjects assessed; and when this average is broken down into separate categories, the situation is even more concerning: 66.1% in mathematics, 53.7% in reading, and 58.7% in science. The table below also presents the differences between the high and lowachieving groups. The scores are lower than the national average of 500 points, and the differences are considerable. The largest gap is found in reading, with a difference of 145.7% between the high and lowperforming groups.
Table 2
Average scores and percentages by overall and high/low performance in PISA 2015, by subject
Mathematics 
Reading 
Science 

Performance 
Average 
% 
Average 
% 
Average 
% 
Overall 
386.6 
100 
397.5 
100 
396.7 
100 
(2.71) 
(2.89) 
(2.35) 

High 
477.2 
33.9 
475.8 
46.3 
470.4 
41.5 
(2.04) 
(1.41) 
(2.10) 
(1.49) 
(0.83) 
(1.40) 

Low 
340.1 
66.1 
330.1 
53.7 
344.7 
58.5 
(1.42) 
(1.41) 
(1.70) 
(1.49) 
(1.26) 
(1.40) 

Difference 
137.1 
145.7 
125.7 
Note: standard errors in parentheses. Calculations apply to the ten plausible values for each subject. All values are significantly different from zero.
Source: compiled by the author, based on the OECD/PISA 2015 database.
Another way of looking at the results is by taking into account the proportion of vulnerable students in the three subjects simultaneously. Considered together with the aid of a Venn diagram, the subject intersections allow for a simple and combined reading of performance. For example, whereas just 27.6% of students performed well across the three subjects, 46.7% performed poorly.5
Figure 1
Venn diagram: lowperformance overlap in the three subjects of PISA 2015
All: high 
Only reading 
Only science 
Only mathematics 
Reading and mathematics 
Reading and science 
Science and mathematics 
All: low 
27.6 (1.39) 
1.9 (0.31) 
2.1 (0.32) 
8.9 (0.68) 
2.9 (0.37) 
2.2 (0.26) 
7.7 (063) 
46.7 (1.42) 
Note: standard errors in parentheses. All values are significantly different from zero. Calculations apply to the ten plausible values for each subject.
Source: compiled by the author, based on the OECD/PISA 2015 database.
In sum, 47% of students are in a position of academic risk: an alarmingly high figure that merits urgent attention.
The socioeconomic index as a basis for relating socioeconomic status with student performance
In Peru, the Learning Quality Measurement Office (Oficina de Medición de la Calidad de los Aprendizajes, UMC), the body responsible for PISA, calculates its own socioeconomic index (UMCISE) in lieu of that employed by the OECD—something that the national report on PISA 2015 results (Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa, 2017, p. 53) presents as a very important development. Indeed, while the OECD’s index of economic, social, and cultural status (ESCS) is widely disseminated in PISA reports and aspires to be universal, it is also criticized for failing to faithfully reflect the socioeconomic structure of each country in which the test is applied. Thus, the UMCISE is an alternative index adapted to the Peruvian context, calculated by the UMC using its own methodology (Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa, 2017, p. 54). This index incorporates items ranging from parents’ education level; goods, assets and services available to the household; and reading material and educational resources available to the student. The components of this new index are based on the model utilized in Peru’s 2015 General Assessment of Students (Evaluación Censal de Estudiantes) conducted throughout the country.
Moreover, the index takes into account the occupational status of parents and the characteristics of the national labor market and its informality, and also corrects for the household possessions variable, which includes uncommon items such as works of art and electronic book readers. Moreover, the UMCISE also takes into account items present within economically disadvantaged segments, such as housing material and access to basic services (Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa, 2017, p. 183). It is compatible with the socioeconomic indices of prior PISA assessments, enabling comparisons over time between the 2009 and 2015 results.6
It is also worth pointing out that when applied to schools, this index can be used to calculate the socioeconomic composition theoretically corresponding to each school, simply by adding the value of the socioeconomic index of the student by school, and dividing it by the number of students at the same school. As noted earlier, several studies point out that school socioeconomic composition is the factor with the greatest effect on performance (Caro & Lenkeit, 2012; Sirin, 2005).
To understand the profiles of the student and school socioeconomic indices, it is useful to visually present the distributions of their probability densities.
Figure 2
Socioeconomic index probability density, by students and schools
Source: compiled by the author, based on the OECD/PISA 2015 database.
Using the same scale on both axes, the densities of the distribution curve corresponding to each index are presented.7 Both indices are distributed approximately around a normal distribution. The distribution of the schools index is more concentrated close to the median.8 The close proximity of the three percentiles in the figure are very similar in the dispersion value. The detailed comparison of values in these indices allows us to better appreciate their respective distributions.
Table 3
Average values and percentiles of socioeconomic indices, by students and schools
Average 
Minimum 
p25 
p50 
p75 
Maximum 
Variance 

UMCISE Student 
0.019 (0.03) 
3.31 
0.76 (0.03) 
0.10 (0.05) 
0.82 (0.03) 
2.06 
1.02 (0.03) 
School socioeconomic index 
0.025 (0.03) 
2.46 
0.64 (0.08) 
0.03 (0.04) 
0.60 (0.04) 
1.32 
0.06 (0.03) 
Note: The overall correlation between the indices is 0.74 (0.01). Standard errors in parentheses. All values are significantly different from zero.
Source: compiled by the author, based on the OECD/PISA 2015 database.
By construction, the indices increase as each percentile increases. In this way, the graphs allow comparison of the distribution profiles of the P25 (a quarter of the distribution), P50 (the median), and P75 quantiles (three quarters of the distribution). The student index exhibits greater variance than the school index. The relatively greater concentration of students in certain scales is part of the explanation for their high and low performances, an issue that will be explored later.
Socioeconomic performance gradients by subject
Socioeconomic indices devised from education assessment surveys can facilitate analysis of performance when it is conventionally accepted that socioeconomically disadvantaged students are those located in the lower quantile of the statistical distribution of this index and, conversely, those situated in the upper quantile or elsewhere in the distribution are advantaged. This approach, frequently adopted in the analysis of PISA data, is likewise employed in this study to determine the differences by student socioeconomic status.
The classical way of measuring the impact of socioeconomic status is through a simple regression between two variables, whose parameters indicate the slope and the strength of the relationship. The term “socioeconomic gradient” refers to the linear relationship between performance in one of the measured subjects and the UMCISE index. The higher values in the slope are related to greater inclusion, while the latter slopes and weak relationships are associated with greater equity. The figure below presents the profile of the relationships for each of the three subjects.
Figure 3
Student performance by socioeconomic index and subject
Science 
Mathematics 
Reading 

UMCISE slope 
33.9 (1.575) 
34.9 (1.775) 
44.4 (1.869) 
UMCISE % explained variance 
20.3 (0.015) 
18.7 (0.016) 
25.9 (0169) 
Constant 
398.6 (1.799) 
388.6279 (2.249) 
399.9 (2.212) 
Note: standard errors in parentheses. Calculations apply to the ten plausible values for each subject. All values are significantly different from zero.
Source: compiled by the author, based on the OECD/PISA 2015 database.
It is worth noting the different values of the linear regression coordinates for each subject between the start and the end of the index values. The case of reading serves as an example: it starts off by displaying a slight socioeconomic influence, then increases to the point where it clearly illustrates the difference according to advantaged socioeconomic status. The graph also highlights the parallels between mathematics and science, which steadily increase alongside socioeconomic status without any intersection on a common point.
It can be seen that the plotted line represents just one average indication of the association between performance and socioeconomic background, since if all students were situated on the line, it would be possible to argue that performance could be predicted solely on the basis of the socioeconomic index. This is not the case, of course, since the results contain a variety of ranges around the line, since there are socioeconomically disadvantaged students with high results and, conversely, advantaged students with low results.
Thus:
a)The strength of the relationship between achievement and socioeconomic status can predict performance by observing how close or how scattered the results are along the line of best fit. The closer together the points are situated on the line of best fit, the greater the strength of the relationship. This aspect of the gradient is represented by the percentage of the variance that is explained by the socioeconomic index. A high percentage means that performance is primarily determined by this index. The relationship strength is 18.7% in the case of mathematics, 20.3% for science, and 25.9% for reading, indicating very high variance. As a reference, the OECD countries have an average variance of 13% in their science results.
b)The slope of the gradient records the impact of socioeconomic status on performance. A very steep slope means that the index has a greater impact on performance, manifesting a greater difference between students according to advantaged or disadvantaged socioeconomic status. In turn, greater equity will be expressed in a flatter slope. In Peru, the positive values of the slopes observed in the graph confirm the advantages of socioeconomic status, but this advantage differs by subject. Thus, for each unit that increases the value of the UMCISE, students benefit from an increase of 33.9 points in science, 34.9 points in mathematics, and a generous 44.4 points in reading.
For reference, in the set of OECD countries, an increase of one point in the index of socioeconomic status9 will bring about an increase of 38 points in the average score in science—slightly higher than the countrylevel index (33.9 points).
It is noteworthy that the slope and the force of the gradient measure different aspects of the relationship. If the slope of the gradient is steep and the strength of the relationship is high, the challenges will be greater, as this signifies a higher probability that student performance is influenced by socioeconomic status, translating into a wider difference between the performances of advantaged and disadvantaged students in the education system. The equations will show that this is the case in Peru.
The variability of schools in performance
The current consensus is that school socioeconomic composition is a significant predictor of academic performance, and serves to orientate education policies. Indeed, in their wellknown study, Perry and McConney (2010) warn that very little is known about the relationship between school and student socioeconomic status when considered simultaneously. Drawing on the PISA 2003 results in reading and mathematics, the authors point to the preeminence of school status in explaining performance, and call for the implementation of education policies conducive to balancing school composition and reducing socioeconomic segregation.
PISA 2015 data permits observation of the variability of performance between schools and students, through the use of the interclass correlation. This statistical construct proposes a null or unconditional model, whereby it is possible to identify variability under the assumption there is no factor involved that could influence it. This interclass correlation coefficient (ICC) is interpretable and useful for multilevel models, as it represents the correlation between two observations within the same cluster. The greater the correlation within the cluster—that is, the larger the ICC—the lower the variability within the cluster and, thus, the greater the variability between the clusters.10
When a null model is proposed through a regression that compares the scores obtained in each subject, without the presence of an external factor, the statistical regressions allow an intraclass correlation to be obtained. The table below shows that interschool variance is relatively high and differs by subject: 44% in science, 40.1% in mathematics and 53.8% in reading. These percentages are attributable to the differences/homogeneities that exist according to the characteristics of each school.11 In addition, the difference is explained by the students’ characteristics and their socioeconomic and family contexts. These figures indicate that schools account for high performance variability, without taking into account the influence of any other factors.
Table 4
Intraclass correlation: percentage of school/student variability by subject
Science 
Mathematics 
Reading 

ICC Total 
44.1 (0.005) 
40.1 (0.005) 
53.8 (0.005) 
Note: standard errors in parentheses. Calculations apply to the ten plausible values for each subject. All values are significantly different.
Source: compiled by the author, based on the OECD/PISA 2015 database.
Socioeconomic indices as factors associated with performance
Although variations in the socioeconomic profile of students within schools, and its influence on achievements, are reasonably well understood, the combination of profiles allows the identification of data that is useful for guiding educational improvement policies (Muelle, 2016; Monseur & Crahey, 2008).
As noted, the regression coefficients, represented by the slopes of the lines, allow identification of a covariable’s impact on the result. To this end, two socioeconomic indices—that of the student and that of the school—can be compared in order to measure their respective effects on performance.
The UMCISE can be represented by its four quantiles, whereby each segment contains 25% of the data. Thus, four groups correspond to the students (called Q1 to Q4 students) and four groups to the schools (called Q1 to Q4 schools). On a sliding scale, these partitions can represent “very low,” “low,” “middle,” and “upper” socioeconomic status.
The next figure shows the 16 coordinates obtained from the correlation between schools and students, and the scores associated with these coordinates. As student socioeconomic status rises, so too does performance, but the extent to which it does depends on the subject and the value of the school socioeconomic index.
Figure 4
Performances by subject, and by student and school socioeconomic quantiles
Note: Standard errors in parentheses. Calculations apply to the ten plausible values. See the detailed table of values in the figure in Appendix A2.
Source: compiled by the author, based on the OECD/PISA 2015 database.
Thus, regardless of students’ own socioeconomic origins, their performance will be higher when they are associated with schools with higher socioeconomic indices. The rate of growth is greater for more advantaged students (Q4) who from the outset achieve higher scores than other students and present steeper growth slopes. Not all disadvantaged students attend disadvantaged schools, though this assertion must be qualified. Indeed, disadvantaged students (Q1students) in all cases achieve lower results than their more advantaged peers across all subjects. In contrast, for advantaged students (Q4students), the school they attend does not matter, even if it is a disadvantaged schools (Q1school): these students always obtain better results than their less advantaged peers.
In Figure 4, this finding is expressed by the similarity between the slopes of the lines representing student socioeconomic status and academic performance. Progress from the Level 2 baseline, which denotes the achievement of better results, occurs only for those students who attend schools with a socioeconomic composition corresponding to the medium and high levels (Q3 and Q4 schools).
As a consequence, attending a school with an advantaged socioeconomic composition can improve the performance of disadvantaged students, but never to the extent that it will equal the performance of advantaged students. To be sure, this does not constitute social determinism; on the contrary, it could be the case that highperforming disadvantaged students are enrolled by their parents in schools with a higher socioeconomic status. However, the data available do not allow for an exploration of this possibility.
Because high/low student performance is a binary dependent variable, analysis with reference to this variable requires logistic regression to obtain a comparison of probability between the categories. To this end, an odds ratio is used, taking Q1 of the UMCISE, corresponding to the “very low” student socioeconomic category, as the baseline. Alongside the distribution of quantiles, the odds ratio of the socioeconomic categories is presented in Table 5, comparing three of the categories (Q2, Q3, and Q4) with the most disadvantaged category (Q1).
Table 5
High and low student performance by subjects and socioeconomic quantiles, in percentages and odds ratio
Performance 
Student socioeconomic quantiles 

Q1 
Q2 
Q3 
Q4 
Odds 1 

High science 
15.6*** 
32.9*** 
52.8*** 
66.1*** 
Q1 vsQ2/Q4 
(1.13) 
(1.78) 
(1.87) 
(2.01) 
5.4 

Low science 
84.4*** 
67.1*** 
47.2*** 
33.9*** 
(0.47) 
(1.13) 
(1.78) 
(1.87) 
(2.01) 

100 
100 
100 
100 

High mathematics 
11.4*** 
25.0*** 
41.9*** 
58.7*** 
Odss 
(1.22) 
(1.74) 
(2.04) 
(2.40) 
Q1 vsQ2/Q4 

Low mathematics 
88.6*** 
75.0*** 
58.1*** 
41.3*** 
7.8 
(1.22) 
(1.74) 
(2.04) 
(2.40) 
(0.99) 

100 
100 
100 
100 

High reading 
15.7*** 
37.2*** 
60.1*** 
74.2*** 
Odds 
(1.39) 
(2.01) 
(1.84) 
(1.78) 
Q1 vsQ2/Q4 

Low reading 
84.3*** 
62.8*** 
39.9*** 
25.8*** 
5.4 
(1.39) 
(2.01) 
(1.84) 
(1.78) 
(0.57) 

100 
100 
100 
100 
Note: Logistic regression per subject. Standard errors in parentheses. Calculations apply to the ten plausible values for each subject. p<0.05, p<0.01, p<0.001.
Source: compiled by the author.
According to the percentages, socioeconomically advantaged students (Q4) account for 66% of the high performances in science, 58% in mathematics, and 74% in reading. On the other hand, socioeconomically disadvantaged students (Q1) are more concentrated, only this time in the lowperformance category: 84%, 88%, and 84%, respectively, in the same subjects. These values are in keeping with the earlier results.
Meanwhile, the odds ratio shows that students pertaining to the most disadvantaged socioeconomic quantile (Q1) have a 5.2 times greater probability of low achievement in science, 7.8 times greater in mathematics, and 5.4 times greater in reading, in comparison with students from the most advantaged quantile (Q4). Again, high performances appear to be reserved to the more advantaged social categories while, in contrast, low performances mainly correspond to students from more modest socioeconomic backgrounds.
After comparing the differences between the categories, three sequential logistic regression models12 are proposed to establish the distinction between high and low performance, first taking into account the student socioeconomic index; then the school socioeconomic index; and finally the interaction between the two indices.
Table 6
Odds ratio of the socioeconomic indices by high and low performance, and intraclass correlation coefficient (%)
Model 1: Student 
Model 2: School 
Model 3: Students and schools 

Odds ratio 
S 
M 
R 
S 
M 
R 
S 
M 
R 
Student 
0.39 (0.02) 
0.39 (0.02 
0.34 (0.02) 
0.70 (0.03) 
0.67 (0.04) 
0.66 (0.04) 

School 
0.22 (0.02) 
0.22 (0.02) 
0.16 (0.02) 
0.31 (0.03) 
0.34 (0.04) 
0.25 (0.03) 

Student and school 
0.94 (0.07) 
0.86 (0.07) 
0.94 50.08) 

Constant ICC 
1.48 (0.08) 
2.18 (0.13) 
1.19 (0.07) 
1.54 (0.08) 
2.29 (0.14) 
1.22 (0.07) 
1.56 (0.09) 11.8 (0.02) 
2.42 (0.16) 11.8 (0.02) 
1.24 (0.08) 18.1 (0.05) 
Note: S=science; M=mathematics; R=reading. Standard errors in parentheses. Calculations apply to the ten plausible values for each subject. All values are significantly different from zero.
Source: compiled by the author, based on the OECD/PISA 2015 database.
The odds ratios again confirm the above results, revealing school composition to be the most powerful determinant of performance, and of greater importance than student socioeconomic status.
Thus, the table shows that an increase in the school socioeconomic index strongly decreases the probability of a student being in the lowresults category: from 1 to 0.22 in science and mathematics, and from 2 to 0.16 in reading. This direct protective effect of the school is greater than that exerted directly by the student’s socioeconomic origins, whereby the decrease in probability ranges from 0.39 to 0.34 for the same subjects.
Regression with interaction, an analytical construct that takes into account the nested character of the two indices (student and school),13 indicates that both combine to reduce the probability of a student being in the low performance category, but this probability depends mainly on the school: the benefit will be greater when the student is socioeconomically advantaged and attends a socioeconomically advantaged school.
For its part, the ICC decreases slightly in comparison with the “null or void” model. Indeed, it should be recalled that the variability of this coefficient is 44.1% in science, 40.1% in mathematics, and 53.8% in reading (see Table 4), but once socioeconomic status is factored in, variability between schools falls to 11.8% for science and mathematics and 18.1% for reading. This confirms, in extenso, the primacy of the school when it comes to explaining poor student performance.
The above results, and the aims of this study, allow for the proposal of a model to explain low performance, taking into account a set of selected variables that are assumed to act significantly on it.
In the econometric relationships presented here, it is assumed that the independent variables included in the model are associated with the dependent variable they seek to explain, but they do not necessarily constitute channels of causality that determine, or are determined by, this variable. Thus, it is necessary to avoid bias in the selection of control variables and in the inference of the results (Angrist & Pischke, 2008).
The results point to weighted likelihood estimate (WLE) contextual variables, with a standard metric whose average is equal to 0 and its standard deviation to 1, through the use of item response theory.
Fortyeight synthetic indices have been identified; in the form of variables, they are intended to cover all optional survey responses: 26 indices corresponding to students, nine to the school, nine to teachers, and four to parents.
As is common in econometrics, algorithms based on statistical criteria, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), are used to select the variables, allowing a tradeoff between goodness of fit and parsimony. In the sample on which this study is based (and as is the case in any questionnaire), many questions were unanswered or inapplicable. As a result, some synthetic indices have missing values, sometimes in significant proportions, and their inclusion in multiple models significantly affects the sample by decreasing the number of observations.14 Thus, for this study, only those variables with missing values below 10% were selected.
The use of the selection algorithm, SELECT (Lindsey, 2014), enabled identification of a set of indices of the maximum reasonable number in line with the AIC and BIC goodnessoffit criteria. Adding other synthetic variables would be of no further explanatory advantage to the model and this prevents the inflation of variables, which would penalize the percentage of missing values.15 From the 6,971 original observations in the entire set, only 9.1% of the total are missing due to lack of response. The final model retains 11 significant variables that express an association with low performance. It should be recalled that this study seeks to explain high/low performance and not overall performance. Thus, it is important to note that there are 14 variables that are not significant in this model, but this does not mean that they are any less important for other models. They include:
•RESPRES: Responsibility for resources.
•CREACTIV: Creative extracurricular activities (Sum).
•SCIERES: Index of sciencespecific resources (Sum).
•TEACHPART: Teacher participation (Sum).
•EDUSHORT: Shortage of educational material (WLE).
•STUBEHA: Student behavior hindering learning (WLE).
•TEACHBEHA: Teacher behavior hindering learning.
•RATCMP_SCH: Number of computers per school (WLE).
•STAFFSHORT Shortage of teaching staff (WLE).
•PROACE: Proportion of qualified teachers.
•RESPCUR: Responsibility for curriculum and assessment.
•SCHAUT: Average school autonomy.
•STRATIO: Student/teacher ratio.
•DISCLISCI: Disciplinary climate in science classes (WLE).
Synthetic indicators are added to this list of contextual variables, and are also nonsignificant in explaining low performance:
•The urban/rural school dichotomy, in which the former is traditionally advantaged. This distinction disappears when it comes to explaining modest performances. The geographical location of the school does not influence poor performance, at least under its current definition.
•In the nonstate/state distinction, the former category is traditionally advantaged, but in this case the difference is nonsignificant when it comes to low performances. It remains to be seen whether this is the result of public schools having adopted a new administrative and social model that has improved their performances, or of private schools having adopted a model that differs from the traditional one.
•Preschool education does not provide the advantages that were once attributed to it. The progress made in addressing and covering a significant proportion of the population of preschoolage children appears to have increased student homogeneity.
The values of these nonsignificant variables from the model are presented in Appendix A1. Given that dependent binary variables (high/low performance) are employed in multilevel logistic models, the coefficients are expressed in odds ratios.
To describe the strength of the relationships, the ICC expresses the correlation between the grouped data structure variables and presents values that are even lower than those obtained earlier (Table 6) when only the student and school indices were taken into account. These values are 7.3% in science, 6.3% in mathematics, and 11.1% in reading.
Given that students within a cluster (school) are not independent of one another and thus share random effects from their cluster, these results open avenues to continue exploring other social factors associated with performance, as well as providing scope to incorporate other variables that may have been omitted or merit better measurement. It is worth reiterating the risk of endogeneity that arises within the dense set of variables presented. Further analysis utilizing different techniques and statistical tests would be necessary to limit and correct for this risk. In other words, this set of variables explains the modest variability between schools in terms of their influence on performance.
Table 7
Model of the effect of significant variables on the low performance of students in PISA 2015 subjects (in odds ratio values)
Variable code and label 
Science 
Mathematics 
Reading 

BELONG: Sense of belonging to the school 
0.719 (0.035) 
0.775 (0.037) 
0.669 (0.029) 

ANXTEST: Anxiety test 
1.328 (0.054) 
1.406 (0.076) 
1.241 (0.049) 

MOTIVAT: Achievement motivation 
0.682 (0.022) 
0.667 (0.042) 
0.786 (0.040) 

UMCISE: Perulevel student socioeconomic index 
0.801 (0.032) 
0.774 (0.049) 
0.729 (0.054) 

UMCISESCH: Perulevel school socioeconomic index 
0.409 (0.032) 
0.414 (0.051) 
0.348 (0.048) 

ST004D01T: Student gender: 
Male Female 
1 1.624 (0.132) 
1 1.625 (0.084) 
1 1.005 (0.065) 
NOTRUANCY: No truancy at the school 
0.993 (0.002) 
0.985 (0.002) 
0.996 (0.003) 

REPEAT: Repeated grade Repeated grade 
No Yes 
1 1.982 (0.206) 
1 1.888 (0.175) 
1 1.854 (0.121) 
ST022Q01TA: Language at home 
Spanish Other 
1 2.558 (0.534) 
1 2.671 (0.431) 
1 3.258 (0.543) 
SCHSIZE: School size: 
Large Medium Small 
1 1.400 (0.125) 1.707 (0.169) 
1 1.427 (0.109) 1.743 (0.329) 
1 1.583 (0.113) 1.774 (0.250) 
Late Ontime 
1 0.416 (0.054) 
1 0.411 (0.051) 
1 0.366 (0.029) 

Constant ICC, % 
0.758 (0.404) 7.3 (0.002) 
2.091 (0.012) 6.3 (0.002) 
1.498 (0.093) 11.1 (0.03) 

Number of observations 
6,425 
6,425 
6,425 
Note: standard errors in parentheses. Calculations apply to the ten plausible values for each subject. All values are significantly different from zero.
Source: compiled by the author, based on the OECD/PISA 2015 database.
The model results demonstrate that the coefficients estimated confirm the dualprotective influence of school and student socioeconomic status. In all cases, advantaged socioeconomic status is an important protective factor against low performance. This impact is greater when the school is socioeconomically advantaged.
To facilitate analysis of the results, a summary is presented below of the nominal categories from the final model that have proven to be significant.
•Gender
The gender disparities in mathematics (in favor of males) and in reading (in favor of females) have been highlighted repeatedly. This difference is corroborated in the results for low performance, in that female students are 62% more likely than males to perform poorly in science and mathematics. But this is not the case in reading, in which there is no significant gender difference among low performers.
•Repetition
In the presence of other factors, students who repeat a grade are twice as likely to perform poorly in comparison with those who do not repeat. However, when this direct association is established without factoring in any other variables, the probability increases up to fivefold, making it one of the most serious determinants with indisputable moral, psychological, and—given its cost—economic consequences.
•Ontime enrollment
Ontime enrollment refers to a student belonging to the modal grade level for their age in the PISA test. Here, the probability of direct influence is as negative as in the case of repetition; students below their grade level are 7.3 times more likely to perform poorly than those students who are at their grade level.16 To be sure, ontime enrollment is associated with repetition, but it is also possible that some students may be below grade level because they started their schooling late, whether voluntarily or not, without having repeated a grade.
•Language
Peru is characterized by its indigenous multilingual and multicultural diversity, which poses educational challenges in terms of coverage and quality with regard to the population of students who speak a language other than Spanish (the PISA testing language) at home. Speaking Spanish at home (92% of students) yields superior results across all tests. This advantage, as might be expected, is greatest in the case of reading. The differences by language are considerable: a student who does not speak Spanish at home is 12.9 times more likely to perform poorly in reading than a student who does speak Spanish at home. This probability is nine times greater in science, and eight times in mathematics.
•School size
Large schools are defined as those with 575 students or more; mediumsized schools have between 150 and 574 students; and small schools have 150 students or less (Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa, 2017, p. 59). The percentage of students in Peru attending each size category are 51%, 34% and 15%, respectively.
According to the model, students who attend a small school are almost twice as likely to obtain poor results. This spatial dimension may not be the chief determinant, but it is still worth exploring.
•Truancy
Students who are never absent from class perform better than those who do. The advantages appear similar across the three subjects in terms of probability. This is undoubtedly a complex phenomenon that calls for social, cultural, economic and motivational approaches.
•Motivational indices
From the vast collection of synthetic indices available, the modeling has allowed for the detection of three motivational indices that are significantly associated with performance, and whose values prove to be just as important as those of the socioeconomic indices. Errors excepted, studies on these factors are relevant in the context of application of national testing.
These indices are constructed on the basis of a set of questions employing the Likert scale, recording student response categories as “strongly agree,” “agree,” “disagree,” and “strongly disagree.” A complete presentation is provided in the OECD technical report (OECD, 2016).
a)Sense of belonging: the index of belonging (BELONG) was constructed on the basis of the following affirmations:
•I feel like a stranger at school.
•I make friends easily at school.
•I feel that I belong at school.
•I feel uncomfortable and out of place at school.
•I get along with other students at school.
•I feel lonely at school.
As with the odd ratio values of the previous model, an index of greater value denotes a greater sense of belonging. An increase in this index decreases the probability from 1 to around 0.7—that is, 30% less—of low performance across all subjects. This reduction respects the ceteris paribus condition of the other variables in the model.
b)Achievement motivation: The achievement motivation (MOTIVAT) index was constructed using the students’ responses to the following:
•I want the best grades in most or all of my courses.
•I want to be able to select the best opportunities available when I graduate.
•I want to be the best in everything I do.
•I consider myself an ambitious person.
•In want to be one of the best students in my class.
Similarly to the index of belonging, higher achievement motivation slightly favors a reduction in the probability of low achievement, from 1 to 0.7, across the three subjects, the effect being most pronounced in the case of reading (0.8).
c)Index of anxiety: the Likert anxiety test (ANXTEST) measures the responses to:
•I often worry that passing a test will be difficult.
•I worry about whether I will obtain good grades at school.
•Despite being wellprepared for a test, I feel very anxious.
•I feel nervous when I don’t know how to complete an assignment at school.
On the other hand, the index of anxiety has a detrimental effect on the three subjects; as this index increases, the probability of low performance also increases, by 32.8% in science, 40.6% in mathematics, and 24.1% in reading.
Figure 5 shows the effects of the indices.
Figure 5
Probabilities of effect of motivational indices on high/low performance, by subject
Note: standard errors in parentheses.
Source: compiled by the author, based on the PISA 2015 Peru database.
The favorable influence of the motivational indices, in terms of preventing low performance, is patent. The effect of a sense of belonging proves to be greater except for the case of mathematics, in which the slopes are parallel with achievement motivation. In turn, when the subjects are taken together, the probability of low performance increases along with the index of anxiety, particularly for mathematics.
Other motivational and behavioral indices are clearly present. For instance, to explain low performance from a meritocratic perspective, Wiederkehr et al. (2015) propose that a student’s perception of selfefficacy is a psychological factor that explains the low performance of socially disadvantaged students.
The increasing interest in the behavioral factors that tend to be associated with performance opens up the possibility to include the measurement and development of other socioemotional skills in assessments.
Two decades of educational assessment attest to some progress in terms of the quality and equity of education systems, but they also show that socioeconomic divides still contribute to significant gaps in student performance.
Thus, analysis of the effects of student and school socioeconomic status on performance is vital to understanding the mechanisms that determine learning achievement. In this regard, the official introduction by the UMC (the body responsible for monitoring PISA in Peru) of an index specific to Peru is valuable because, as its creators point out, it is adapted to the country’s particular characteristics.
The gradients of the regressions demonstrate systematically that as this index increases toward more favorable socioeconomic positions, so too do the scores. Thus, for each unit that increases the value of the index, students benefit from an increase of 33.9 points in science, 34.9 points in mathematics, and a generous 44.4 points in reading. The variances in the regressions also show the strength of the relationship with performance. The index thus accounts for 18.7% of the variance in mathematics, 20.3% in science, and 27.9% in reading; these values are considered high and attest to the strong impact of socioeconomic profile—greater even than the average for the OECD countries that participate in PISA, where, for instance, the index explains 13% of the variance in the science results.
To measure the variability of achievement, ICC can be used to measure the importance of the socioeconomic index within the cluster: namely, the school. The greater its value, the lower the variability within the cluster and, consequently, the greater the variability between clusters. Thus, variability between schools is relatively high, such that 44.1% in science, 40.1% in mathematics, and 54.8% in reading are attributable to differences in characteristics between schools. In turn, the remainder is explained by student characteristics and their socioeconomic and family contexts.
Once the socioeconomic index is taken into account, the outlook becomes clear. The higher results reveal a stark socioeconomic distinction. Depending on the subject, between 66% and 72% of students of high socioeconomic status achieve high performances. By contrast, only 11% to 16% of students of very low socioeconomic status do so. As is to be expected, when it comes to low results, disadvantaged students account for between 84% and 89% of these, while advantaged students account for just 26 to 34%.
Moreover, whatever their socioeconomic background, students will perform better if they attend advantaged schools, though the effect is more marked if the students are also advantaged. These advantaged students, even when they attend disadvantaged schools, will always achieve better results than their counterparts of lower socioeconomic status. As a result, schools of advantaged socioeconomic composition improve the results of disadvantaged students, but never to the extent that they equal the results obtained by advantaged students.
Once the odds ratio is included, is can be seen that disadvantaged students are always more likely than advantaged students to belong to the group of lowachievers: 5.4 times more likely in the case of science and reading, and 7.8 times more likely for mathematics. Factoring in the two socioeconomic indicators at once, the socioeconomic composition of the school attended is revealed to be the dominant factor in this relationship.
This study utilizes the vast collection of synthetic indicators from PISA surveys to construct a model containing a set of variables associated with low performance. Multilevel mixedeffect logistic regression is employed to explain the binary variable of high/low performance.
The AIC and BIC selection criteria attest to the significance of just 11 variables in explaining low performance. Grade repetition and home language variables are those with the greatest probabilities of influence. These two factors feature repeatedly in PISA reports and studies, a persistence which reveals a need to review of education policies aimed at addressing these inequalities.
The model also indicates that when it comes to low performance in reading, gender makes no difference, which is a departure from traditional findings in assessments that do not distinguish between levels of achievement. However, the difference always favors males in the case of science and mathematics. School size appears to work against small schools, in which the probability of low performance is higher than for large schools. Predictably, regular school attendence protects against low performance.
An important characteristic in the model is the presence of emotional and motivational factors, such as sense of belonging to the school, anxiety, and motivation. The emergence of these motivational factors could indicate the need to take measures to improve school climate, among others.
Those factors that do not have significant effects on low performance must also be mentioned. Educational material and resources, availability of computers per school, the student/teacher ratio, shortages of teaching staff, the proportion of qualified teachers, and teacher participation appear not to affect low performance. Added to these factors are responsibility for educational resources, school autonomy, and disciplinary climate in the class.
The absence of the traditional difference between public and private schools is notable, probably due to the emergence of new forms and mechanisms of coexistence between these categories (Balarín, 2015). Moreover, no effects related to early education were recorded, perhaps because of its expansion some time ago.
The absence of significance in the case of these and other factors does not reflect an absence of educational policies related to them. For example, the personal and professional characteristics of teachers, teaching methods, teachers’ attitudes and behavior, and parental commitment and involvement, among others, have not been explored here.
This study seeks to contribute to identifying the effects of student and school socioeconomic conditions on low performance. It focused on distinguishing analytically between high and low academic performance through mixed multilevel logistic regressions and the simultaneous use of plausible values as criterion variables in the three PISA 2015 subjects.
School socioeconomic composition proved to be the greatest determinant of low academic performance; therefore, the pursuit of balanced composition in which socioeconomic origin is not grounds for discrimination should be an urgent, fundamental objective of educational policy. This study reveals that performance differs by subject and according to factors related to student and school. These results support the idea of implementing specific educational policies to address these different factors.
The poor performance of Peruvian students in PISA 2015, as identified here, stresses the need to again introduce targeted educational policies as a matter of urgency. Without proposing cause and effect relationships between the factors associated with poor performance, this study recognizes that there may be mechanisms that give rise to endogeneity and interaction between the variables—which future research must explore further—and raises new questions related to securing improvements in learning.
Appendix 1
Compendium of nonsignificant results from the final model corresponding to low student performance, PISA 2015 (odds ratio).
Variable code and label 
Science 
Mathematics 
Reading 
RESPRES: Responsibility for resources 
0.932 
0.912 
0.917 
(0.103) 
(0.666) 
(0.077) 

CREACTIV: Creative extracurricular activities (sum) 
1.038 (0.048) 
0.992 (0.051) 
1.020 (0.049) 
SCIERES: Index of sciencespecific resources (sum) 
0.991 (0.022) 
1.010 (0.025) 
0.952 (0.024) 
TEACHPART: Teacher participation 
1.041 (0.021) 
1.015 (0.025) 
1.054 (0.024) 
EDUSHORT: Shortage of educational materials (WLE) 
1.008 (0.025) 
1.015 (0.036) 
1.032 (0.075) 
STUBEHA: Student behavior hindering learning (WLE) 
1.074 (0.036) 
1.065 (0.047) 
1.073 (0.051) 
TEACHBEHA: Teacher behavior hindering learning (WLE) 
0.957 (0.020) 
0.931 (0.032) 
0.944 (0.046) 
RATCMP_SCH: Number of computers per school grade, modal 
0.977 (0.108) 
0.911 (0.092) 
0.965 (0.068) 
STAFFSHORT: Shortage of teaching staff (WLE) 
0.956 (0.025) 
0.995 (0.050) 
0.976 (0.036) 
PROATCE: Index of proportion of qualified teachers 
1.007 (0.134) 
1.056 (0.166) 
0.653 (0.096) 
RESPCUR: Responsibility for curriculum 
0.944 (0.048) 
1.026 (0.072) 
0.943 (0.054) 
SCHAUT: Average school autonomy 
0.666 (0.109) 
0.963 (0.249) 
0.576 (0.240) 
STRATIO: Student/teacher ratio 
1.007 (0.004) 
1.006 (0.005) 
1.010 (0.003) 
DISCLISCI: Disciplinary climate in science classes (WLE) 
0.869 (0.081) 
0.868 (0.196) 
1.080 (0.118) 
STRATUM/ Sector: Public Private 
1 1.317 (0.556) 
1 1.159 (0.412) 
1 1.421 (0.466) 
ST125Q01NA/Attended preschool: No 
1 
1 
1 
Yes 
1.209 (0.330) 
1.356 (0.200) 
1.115 (0.226) 
STRATUM/ Location: Urban 
1 
1 
1 
Rural 
1.675 (0.571) 
1.490 (0.555) 
1.624 (0.578) 
Appendix 2
Distribution of average scores by subjects and school and student socioeconomic quantiles
Science 
Mathematics 
Reading 

Schools 

Students 
Q1 
Q1 
Q3 
Q4 
Q1 
Q2 
Q3 
Q4 
Q1 
Q2 
Q3 
Q4 
Q1 
337.4 
362.8 
398.3 
431.7 
335.3 
347.5 
384.3 
423.6 
326.4 
352.2 
401.9 
443.1 
(4.66) 
(4.50) 
(4.73) 
(4.36) 
(4.19) 
4.827 
(5.18) 
(5.50) 
(4.51) 
(5.49) 
(4.69) 
(6.32) 

Q2 
345.8 
377.9 
410.3 
453.4 
335.9 
367.9 
400.1 
445.5 
325.7 
378.6 
420.5 
466.4 
(4.26) 
(4.27) 
(4.29) 
(7.00) 
(5.02) 
(6.07) 
(5.65) 
(5.59) 
(4.44) 
(6.38) 
(5.92) 
(6.04) 

Q3 
346.7 
386.3 
420.2 
453.4 
336.6 
368.9 
408.5 
450.8 
330.7 
379.6 
429.9 
477.5 
(4.55) 
(4.71) 
(4.66) 
(5.64) 
(5.56) 
(5.80) 
(5.08) 
(7.51) 
(5.57) 
(5.33) 
(5.37) 
(6.58) 

Q4 
355.0 
392.4 
426.0 
467.8 
345.3 
379.8 
416.7 
458.8 
339.1 
393.9 
437.5 
479.8 
(4.77) 
(4.43) 
(4.79) 
(7.62) 
(5.55) 
(5.80) 
(5.66) 
(6.60) 
(5.88) 
(5.86) 
(5.58) 
(7.76) 
Note: standard errors in parentheses.
Source: compiled by the author, based on the PISA 2015 Peru database.
References
Agüero, J., & Cueto, S. (2004). Dime con quién estudias y te diré cómo rindes: peereffects como determinantes del rendimiento escolar. Lima: CIES – Consorcio de Investigación Económica y Social.
Angrist, J., & Pischke, S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
Balarín, M. (2015). Las múltiples formas y efectos de la participación del sector privado en la educación. Lima: Proyecto Forge – Grupo de Análisis para el Desarrollo (Grade).
Benavides M., León, J., & Etesse, M. (2014). Desigualdades educativas y segregación en el sistema educativo peruano. Una mirada comparativa en las pruebas PISA 2000 y 2009. Lima: Grade.
BIDCIMA. (2016). PISA 2015: ¿cómo le fue a la región? Washington.
Caro, D., & Lenkeit, J. (2012). An analytical approach to study educational inequalities: 10 hypothesis tests in PIRLS 2006. International Journal of Research & Method in Education, 35(1), 330.
Carraso, G. (2007). Calidad y equidad en las escuelas peruanas: un estudio del efecto escuela en la prueba de Matemática – PISA 2000. Lima: Consorcio de Investigación Económica y Social and Centro de Estudios y Promoción del Desarrollo.
Coleman, J. S. (1966). Equality and Educational Opportunity. Washington: US Congressional Printing Office.
Cordero, J., Crespo, E., & Pedraja, F. (2013). Rendimiento educativo y determinantes según PISA: una revisión de la literatura en España. Revista de Educación, 362, SeptemberDecember.
Cuenca, R., Carrillo, S., de los Ríos, C., Reátegui, C., & Ortiz, G. (2017). La calidad y equidad de la educación secundaria en el Perú. Working paper 237. Lima: Instituto de Estudios Peruanos.
Cueto, S. (2007). Las evaluaciones nacionales e internacionales de rendimiento escolar en el Perú: balance y perspectivas. In Grupo de Análisis para el Desarrollo (Ed.), Investigación, políticas y desarrollo en el Perú (pp. 405455). Lima: Grade.
Gamazo, A., Martínez, F., Olmos, S., & Rodríguez, M. (2018). Evaluación de factores relacionados con la eficacia escolar en PISA 2015. Un análisis multinivel. Revista de Educación (España), 379, JanuaryMarch
Greene, W. (2018). Econometric Analysis. 8th ed. Pearson Education Ltd.
Guadalupe, C., & Villanueva, A. (2013). PISA 2009/2000 en América Latina: una relectura de los cambios en el desempeño lector y su relación con las condiciones sociales. Apuntes: Revista de Ciencias Sociales, 40(72), 157192.
Hanushek, E., & Woessmann, L. (2008). The Role of Cognitive Skills in Economic Development. Journal of Economic Literature, 46(3), 607668.
León, J., & Collahua, Y. (2016). El efecto del nivel socioeconómico en el rendimiento de los estudiantes peruanos: un balance de su efecto en los últimos quince años. Lima: Grade.
León, J., & Youn, M.J. (2016). El efecto de los procesos escolares en el rendimiento en matemática y las brechas de rendimiento debido a diferencias socioeconómicas de los estudiantes peruanos. Revista Peruana de Investigación Educativa, 8, 149180.
Lindsay, C. (2014). VSELECT: Stata module to perform linear regression variable selection. Boston: Department of Economics, Boston College.
Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa. (2006). Evaluación nacional del rendimiento estudiantil 2004. ¿Cómo disminuir la inequidad del sistema educativo peruano y mejorar el rendimiento de sus estudiantes? Factores explicativos más relevantes en la Evaluación Nacional 2004. Lima: Author.
Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa. (2017). El Perú en PISA 2015. Informe nacional de resultados.
Miranda, L. (2008). Factores asociados al rendimiento escolar y sus implicancias para la política educativa del Perú. In Análisis de programas, procesos y resultados educativos en el Perú: contribuciones empíricas para el debate (pp. 1139). Lima: Grade.
Monseur, C., & Crahey, M. (2008). Composition académique et sociale des établissements, efficacité et inegalité scolaires. Une comparaison internationale. Revue Française de Pédagogie, RFP, 164, 5565.
Muelle, L. (2016). Factores de riesgo en el bajo desempeño académico y desigualdad social en el Perú según PISA 2012. Apuntes, XLIII(79), 1045.
Murillo, J. (2007). Investigación iberoamericana sobre eficacia escolar. Bogotá: Convenio Andrés Bello.
OECD (Organisation for Economic Cooperation and Development, Paris). (2011). Against the Odds: Disadvantaged Students who Succeed in School.
OECD (Organisation for Economic Cooperation and Development, Paris). (2016). PISA 2015 results: Excellence and equity in education, vol. I.
OECD (Organisation for Economic Cooperation and Development, Paris). (2017). How do schools compensate for socioeconomic disadvantage? PISA In Focus, 76.
Perry, L., & McConney, A. (2010). School socioeconomic composition and student outcomes in Australia: Implications for educational policy. Australian Journal of Education, 54(1), 7285.
RabeHesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling using Stata. 3rd ed. College Station: Stata Press.
Raudenbush, S., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. Thousand Oaks: Sage.
Sirin, S. (2005). Socioeconomic Status and Academic Achievement: A MetaAnalytic Review of Research. Review of Educational Research, 75(3), 417453.
von Davier, M., Gonzales, E., & Mislevy, R. (2009). What are Plausible Values and Why are They Useful? IERI Monograph Series: Issues and Methodologies in LargeScale Assessments, 2, 936.
Wiederkehr, V., Damon, C., Chazal, S., Guimond, S., & Martinot, D. (2015). From Social Class to Selfefficacy: Internalization of Low Social Status Pupils’ School Performance. Social Psychology of Education, 18(4), 116.
1 Luis Muelle holds a Ph.D. in Education Economics from the Université de Bourgogne in France. He has served as a researcher at the Instituto Nacional de Investigación y Desarrollo Educativo (Peru), social research director at the Consejo Nacional de Ciencia y Tecnología (Peru), professor at the Centre International d’Études Pédagogiques (France), and a consultant for the European Union. He is currently working as an education consultant, specializing in education economics, assessment of achievement, and social inequality. The author thanks the reviewers for their pertinent comments.
2 At the Latin American level, Peru also participated in UNESCO’s Latin American Laboratory for Assessment of the Quality of Education, conducted in 1997, 2006, and 2013.
3 Many of the contextual and noncontextual school and studentlevel variables in the database are summarized by way of factor analysis and item response theory (IRT) techniques taken from questionnaire responses.
4 Econometrists usually attribute differences in achievement to the “level2 endogeneity” of the covariates referring to students with unobserved school characteristics, while educators prefer to interpret these differences as contextual (RabeHesketh & Skondral, 2012).
5 The calculations take into account the ten respective plausible valuables of the highlow performance binary variable for each subject.
6 It should also be noted that this index allows the socioeconomic levels of students to be established based on their percentiles—35, 60 or 85 (Ministerio de Educación del Perú – Unidad de Medición de la Calidad Educativa, 2017:57).
7 The units of scale in the densities axis are expressed in probabilities per unit of measurement in the respective socioeconomic indices . Thus, the area beneath the curve represents an integral with the value of 1; that is, the total probability of the distribution.
8 In the case of both students and schools, it can be noted that the UMCISE index is more favorable than the OECD ESCS index, and this should be taken into account when it comes to comparing the association of these indices and student performance.
9 Although both indices seek to measure student socioeconomic status, it should be recalled that the PISA ESEC index differs from Peru’s UMCISE.
10 Alternatively, it is also a measure of how much variation there is at each level, which is why it is also known as the variance partition coefficient (VPC).
11 For the set of countries participating in PISA 2015, the intraclass correlation in science is 30.1% (OECD, 2017, p. 227).
12 Logistic regression is, in reality, an ordinary linear regression that uses the logit value as the response variable. The logit transformation allows a linear transformation of this relationship between the response variable (dependent) and the coefficients. The constant value is the expected value of the performance logodds when the variables equal zero.
13 This principle occurs with multilevel hierarchical analysis.
14 Many of the 48 variables possess missing values totaling more than 10%. The effects of multicollinearity and multiplicative elimination of the other variables are very risky.
15 Moreover, the variance inflation factor is just 1.6%, below the generally accepted limit of 5% required to demarcate collinearity between the variables.
16 At the time of sampling in Peru, 75.3% of students were at their ageappropriate grade level (4th and 5th grade of secondary).