引用本文: |
-
李英华,刘妍,秦永松.缺失数据下线性模型中缺失值处理方法的比较[J].广西科学,2009,16(4):400-402,413. [点击复制]
- LI Ying-hua,LIU Yan,QIN Yong-song.Comparison of Methods to Handle Missing Values in Linear Models with Missing Data[J].Guangxi Sciences,2009,16(4):400-402,413. [点击复制]
|
|
摘要: |
在响应变量随机缺失的线性模型中,利用R统计软件模拟比较完全样本法、固定填补法和分数线性回归填补法得到的回归系数、响应变量均值、响应变量的分布函数、响应变量的分位数估计,并用标准误差(SE)评判其优劣.结果表明,除固定填补法外,无论采用其余哪种方法,随着样本容量的增大,评判值SE减小,样本容量越大,估计也就越精确;缺失概率的大小也影响估计的精度,缺失概率越大,相应的评判值SE越大,估计的精度也就越差;另外,在分数线性回归填补法中,J=5的结果总是比J=1的结果好,这说明随着J的增大,其估计精度也随着提高. |
关键词: 线性模型 缺失数据 固定填补 分数填补 |
DOI: |
投稿时间:2008-11-17 |
基金项目:国家自然科学基金项目(10661003);广西科学基金项目(0728092);教育部留学回国人员科研启动基金项目([2004]527)资助 |
|
Comparison of Methods to Handle Missing Values in Linear Models with Missing Data |
LI Ying-hua1, LIU Yan2, QIN Yong-song1
|
(1.School of Mathematical Sciences, Guangxi Normal University, Guilin, Guangxi, 541004, China;2.Foreign Language School Attached to Guangxi Normal University, Guilin, Guangxi, 541004, China) |
Abstract: |
When the response variable is missing at random in a linear model, three means are considered to handle missing values.They are deleting cases with missing values, deterministic and fractional linear regression imputations.Based on these methods, three estimators are studied for the regression parameters such as the mean, the distribution functions and the quantiles of the response variable.Simulations using statistical R software are conducted to compare the performances of three estimators.The results show that if we use the methods except for the deterministic imputation, the values of SE decrease and the estimations are more accurate as the sample sizes increase.We can also see that the values of SEs increase and the estimatiors are less accurate as the response probabilities decrease.The estimatiors are more accurate at J=5 than that at J=1, which shows that the accuracy of the estimators increases as J increases based on the fractional regression imputation. |
Key words: linear model missing data deterministic imputation fractional imputation |