Bivariate Model for the Saber11 Tests in Tolima Department (Colombia)
Abstract
In many applications, we find data that are restricted to belong to the interval (0;1), such as percentages and proportions, which also can be explained by other variables by means of a regression model in which the response variable has beta distribution. On the other hand, there are pairs of variables that have some dependency, such as math and language performance in the state test Saber11 in Tolima Department (Colombia) in 2016. The theory of Copula functions arises as an alternative to measure the dependence of random variables with given marginal distributions, allowing to estimate different measures of association and to construct different methods of estimation. To analyze this type of data, we use a bivariate model under the context of copula functions for data in the interval (0;1). Properties of fitted models were verified, and different estimation methods, were compared using the copula and VineCopula packages of the R software in order to establish the best model for analyzing this type of data. Simulated data were used to carry out this process, and the models were applied to real data of performance in critical reading and mathematics of students between 14 and 24 years.
Keywords
Bivariate models,, copula functions, dependence between random variables.
References
[1] Agresti, A. (2002). Categorical Data Analysis. John Wiley and Sons, Hoboken, New Jersey.
[2] Bustos, O. y Guerrero, A. (2011). Breve Introducción a la Matemática de la Estadística Espacial. Sociedad Brasileña de Matemática. Ensayos Matemáticos. 20, 1-115.
[3] Casella, G. y Berger, R. (2002). Statistical Inference. Duxbury Press, Belmont, CA.
[4] Cepeda-Cuervo, E. (2001) Modelagem da Variabilidade em Modelos Lineares Generalizados (unplublished PhD tesis). Universidade Federal de Rio de Janeiro, Rio de Janeiro.
[5] Cepeda-Cuervo, E., Achcar, J. and Lopera, L. (2012). Bivariate Beta Regression Models: a Bayesian Approach appliedto educational data. Monograph (technical report). Not published. [available at: <www.bdigital.unal.edu.co/5851>]. Retrieved on May 10, 2014.
[6] Cepeda-Cuervo, E. and Núñez-Antón, V. (2013). Spatial double generalized beta regression models: Extensions and application to study quality of education in Colombia. Journal of Educational and Behavioral Statistics. 38 (6), 604-628.
[7] Cribari-Neto, F. and Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software, 34(2), 1-23.
[8] Cribari-Neto, F. and Souza, T. (2011). Testing inference in variable dispersion beta regression. Journal of Statistical Computation and Simulation, iFirst: 1-17.
[9] Erley, A. (2009). Cópulas y dependencia de variables aleatórias: Una introducción. Miscelánea Matemática. 48, 7-28.
[10] Eskelson, B., Madsen, L., Hagan, J. and Temesgen, H. (2010). Estimating riparian understory vegetation cover with beta regression and copula models. Forest Science, 57 (3), 212-221.
[11] Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.
[12] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman and Hall, London.
[13] Nelsen, R. (1999). An introduction to Copulas. Springer. New York.
[14] Ospina, R. and Ferrari, S. (2012). A General class of zero - or - one inflated beta regression models. Journal of Computational Statistical and Data Analysis, 56(6), 1609-1623.
[15] R-Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[16] Sklar, A. (1959). Fonctions de répartition á n dimensions et leurs marges. Inst. Statist. Univ. Paris Publ. 8, 229-231.
[17] Smithson, M and Verkuilen, J. (2006). A better lemon squeezer maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods. Vol 11 No. 1, 54-71.