Journal of Management Information and Decision Sciences (Print ISSN: 1524-7252; Online ISSN: 1532-5806)

Research Article: 2022 Vol: 25 Issue: 4

Estimation of population mean using ratio type imputation technique with linear combination of two auxiliary variable under two-phase sampling

Deepjan Gohain, North Eastern Regional Institute of Science and Technology

Krishnajyoti Nath, North Eastern Regional Institute of Science and Technology

Singh BK, North Eastern Regional Institute of Science and Technology

Citation Information: Gohain, D., Nath, K., & Singh, B.K. (2022). Estimation of population mean using ratio type imputation technique with linear combination of two auxiliary variable under two-phase sampling. Journal of Management Information and Decision Sciences, 25(S5), 1-20.

Abstract

Present paper proposes four generalized classes of estimators for estimating population mean under the framework of two-phase sampling design by using auxiliary information and also the expressions for bias and mean square error are derived. These types of imputation techniques are used in different decision science related fields for better results. In addition, theoretical results showing the superiority of the proposed estimator over existing estimators from empirical studies based on different datasets from classical statistical literature are shown.

Keywords

Imputation; Bias; Mean Square Error (MSE); Missing Data; Large Sample Approximation; Simple Random Sampling without Replacement (SRSWOR).

Introduction

The sampling unit refuses to participate in the sample survey, cannot respond, cannot be contacted, or accidentally loses some of the information collected due to unexpected factors, resulting in incomplete survey responses. To deal with missing data effectively Kalton et al. (1981) and Sande (1979) suggested imputation methods that make an incomplete data set structurally complete and its analysis simple. Hyunshik Lee & Särndal (1994); and Lee et al. (1995) used the information on an auxiliary variable for the purpose of imputation. Later Singh and Horn (2000) introduced a compromised method of imputation based on auxiliary variables. Ahmed et al. (2006) discussed several new imputation based estimators that used the information on an auxiliary variate and compared their performance with the mean method of imputation.

Singh and Horn (2000); Wright & Capps (2011), Singh & Gogoi (2017); Singh & Nath (2018b; 2019) and Joyce et al. (2021) discussed designing mixed sampling plan based on IPD and some imputation methods of missing data for estimating the population mean using two-phase sampling scheme.

The objective of the present research work is to provide more efficient alternative estimators than the existing ones, when population parameter of auxiliary information is missing or unknown.

Notations

Let equation be a finite population of size N and Y is the study variable and X, Z are the auxiliary variable where Y, X and Z are the population mean of the variable Y, X and Z respectively.

Consider a first phase sample equation of size equation drawn from the population equation by using SRSWOR method and a second sample S of size equation drawn from equation or S1

Case-I: when second sample S is drawn from S1 i.e. second sample S is depends on first sample S1 (denoted by design I) as in Figure 1.

Case-II: when second sample S is drawn from equation i.e. second sample S is independent of first sample S1 (denoted by design II) as in Figure 1.

Let the second sample S contains equation responding units forming a sub space R and equation non-responding units with sub space Rc , such that equation For every unit equation is observed available. For every unit equation values are missing and imputed values are observed available. For every unit equation values are missing and imputed values are computed. The equation of auxiliary variables are used as a source of imputation for missing data when equation assuming that in S and S1 the data equation are known.

equation

equation

equation

equation

equation population correlation coefficient between X and Y , Y and Z & Z and X respectively.

equation the coefficient of variation of X, Y and Z respectively.

equation

Now using the concept of two-phase sampling and denoting E1and E2 as the expectation over first phase and second phase respectively we have the following expected values.

Case I: when S is drawn from S1

equation

Similarly,

equation

Similarly,

equation

Similarly,

equation

Case II: when S is drawn from equation

equation

Similarly equation

equation

Similarly,

equation

equation

Similarly,

equation

Some Existing Imputation Techniques

Mean Method of Imputation

Under Mean method of imputation

equation

Using above the point estimator of population mean equation

The bias and Variance are given by

equation

Ratio Method of Imputation (Hyunshik Lee & Särndal, 1994)

Under Ratio Method of Imputation

equation

Using above the point estimator of population mean Y is

equation

The bias and MSE are given by

equation

Compromised Method of Imputation (Singh & Horn, 2000)

Under this method of imputation

equation

Using the above the point estimator of population mean Y is

equation

Where β is a constant to be determined such that MSE of ycomp is minimum.

The optimum Bias and MSE are-

equation

Exponential Ratio Method of Imputation in two-phase sampling (Pandey et al., 2015)

Under this method of imputation

equation

equation

Using above the point estimator of population mean Y is

equation

Where α is a drawn constant such that MSE of equation is minimum.

The optimum Bias and MSE are given by

equation

equation

Dual to Ratio Method of Imputation in two-phase sampling (Singh & Nath, 2018a)

Under this method of imputation

equation

equation

Using above the point estimator of population mean equation is

equation

Where β is a drawn constant such that MSE of equation is minimum.

The optimum Bias and MSE are given by

equation

Proposed Imputation Strategies

Motivating the above imputation methods of population mean, we have proposed the following Multivariate Ratio type imputation methods of population mean in two-phase sampling.

Imputation Method equation

The imputation scheme is as follows:

equation

Imputation Method equation

The imputation scheme is as follows

equation

Imputation Method equation

The imputation scheme is as follows

equation

Imputation Method equation

The imputation scheme is as follows

equation

Point estimators for population mean equation under the proposed four types of imputation methods equation can easily be deduced. We have the point estimators-

equation

In general the above four imputation method can be defined as equation

The imputation scheme is a follows

equation

Point estimator for population mean equation

equation

Where α1 and α2 are suitable chosen constants to be determined such that MSE of the point estimator has minimum and equation

Expanding equation in terms of equation retaining the terms upto first order approximate we have

equation

equation

Where,

equation

equation

Properties of Proposed Estimator

The bias, MSE and min MSE of the proposed point estimators have been derived in the following theorems.

Theorem 1

Bias of the estimators equation under design I and design II upto first order of approximation are as:

equation

equation

Proof: Taking expectation on both sides of equation (1) we have

equation

Putting the expected values under design I we have

equation

Putting the expected values under design II we have

equation

Theorem 2

MSE of the estimators equation under design I and design II upto first order approximation are as-

equation

Proof: Taking expectation after squaring the both sides of (3) we have

equation

Putting the expected values under design I we have

equation

equation

Putting the expected values under design II we have

equation

The optimum value of equation is obtained by minimizing equation given in equation (6) and (7) by using the method of maxima and minima we have-

equation

Putting the optimum values of equation under the design I and design II in equation (2) and (3) and solving for equation we have

equation

Putting the optimum values of equation under design I and II in equation (4) & (5) we have

equation

Theorem 3

The estimator equation is unbiased for optimum values of equation under design I and II.

Proof: Putting the optimum values of equation under design I in equation (6) we have

equation

Similarly put ting the optimum values of equation under design II in equation (7) we have

equation

Comparison

In this section we divide the conditions under which the suggested estimator is superior to the existing estimators in design I and design II. To compare the different estimators we use the following theorem of multiple correlation coefficients.

Comparison with Mean Method of Imputation

equation

equation is always efficient than equation in design I and design II.

Comparison with Ratio Method of Imputation

equation

equation is always efficient than equation in design I and design II.

Comparison with Compromised Method of Imputation

equation

Comparison with Exponential Ratio Method of Imputation

equation

equation

Comparison with dual to Ratio Method of Imputation

equation

equation is always efficient than equation in design I and design II.

Empirical Study

To examine the performance of the proposed estimator of the population mean in twophase sampling, we have considered the following three populations (Tables 1-10).

Table 1 Mse of The Different Estimators Under Design I
Point Estimator Population I Population II Population III
equation 2.026693 2559.906609 10.369737
equation 1.875253 1631.420886 .955755
equation 1.757726 1630.937426 8.752709
equation 1.324985 1084.832174 7.845193
equation 1.152552 448.613009 6.731424
equation 1.090778 446.556316 6.273775

 

Table 2 Mse of The Different Estimators Under Design II
Point Estimator Population I Population II Population III
equation 2.026693 2559.906609 10.369737
equation 1.875253 1631.420886 8.955755
equation 1.757726 1630.937426 8.752709
equation 1.117529 828.239053 6.4115601
equation 1.117496 405.529871 6.356754
equation 1.053244 403.431209 5.851978

 

Table 3 Pre of The Different Estimators With Respect To equation Under Design I
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 106.686 100.029 102.319
equation 141.530 150.385 114.156
equation 162.704 363.659 133.044
equation 171.918 365.334 142.749

 

Table 4 Pre of The Different Estimators With Respect To equation Under Design Ii
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 106.686 100.029 102.319
equation 167.803 196.975 139.681
equation 167.808 402.294 140.886
equation 178.045 404.386 153.038

 

Table 5 Pre of The Different Estimators With Respect To equation Under Design I
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 132.660 150.340 111.568
equation 152.507 363.551 130.027
equation 161.144 365.225 139.513

 

Table 6 Pre of The Different Estimators With Respect To equation Under Design Ii
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 157.287 196.916 136.514
equation 157.292 402.174 137.691
equation 166.887 404.267 149.568

 

Table 7 Pre of The Different Estimators With Respect To equation Under Design I
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 114.961 241.819 116.546
equation 121.472 242.933 125.047

 

Table 8 Pre of The Different Estimators With Respect To equation Under Design Ii
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 100.003 204.236 100.086
equation 106.103 205.299 109.562

 

Table 9 Pre of The Different Estimators With Respect To equation Under Design I
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 105.666 100.460 107.295

 

Table 10 Pre of The Different Estimators With Respect To equation Under Design Ii
Point Estimator Population I Population II Population III
equation 100.000 100.000 100.000
equation 106.100 100.495 108.626

Population I (Cochran, 1977)

Y : Number of placebo children

X : Number of paralytic polio cases in the placebo group

Z :Number of paralytic polio cases in the ‘not inoculated group

equation

Population II (Murthy, 1967)

Y : Area under wheat in 1964

X : Area under wheat in 1963

Z :Cultivated area in 1961

equation

Population III (Anderson, 2003)

Y : Head length of second son

X : Head length of first son

Z :Head breathe of first son

equation

Conclusion

From the above tables, it is obvious that the suggested have smaller Mean Square Error (MSE) than the MSE’s of the other existing estimators both theoretically as well as empirically under dependent and independent cases. Also the Bias of the proposed estimator vanishes at the optimum values of α1 and α2 . . Therefore it is concluded that the proposed estimator is preferable to use over other existing estimator.

References

Ahmed, M.S., Al-Titi, O., Al-Rawi, Z., & Abu-Dayyeh, W. (2006). Estimation of a population mean using different imputation methods. Statistics in Transition, 7(6), 1247-1264.

Indexed at, Google Scholar

Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd edR John Wiley and Sons. New York.

Google Scholar

Cochran, W.G. (1977). Sampling techniques. John Wiley & Sons.

Indexed at, Google Scholar

Hyunshik Lee, E.R., & Särndal, C.E. (1994). Experiments with variance estimation from survey data with imputed values. Journal of official Statistics10(3), 231-243.

Indexed at, Google Scholar

Joyce, V.J., Merlin, G.S., Edna, K.R.J., & Fenella, S. (2021). Designing Mixed Sampling Plan Based on IPD. Journal of Management Information and Decision Sciences24, 1-6.

Indexed at, Google Scholar

Kalton, G., Kasprzyk, D., & Santos, R. (1981). Issues of nonresponse and imputation in the survey of income and program participation. In Current topics in survey sampling (pp. 455-480). Academic Press.

Indexed at, Google Scholar, Cross Ref

Lee, H., Rancourt, E., & Sarndal, C.E. (1995). Variance estimation in the presence of imputed data for the generalized estimation system. Proc. of the American Statist. Assoc.(Social Survey Research Methods Section), 384-389.

Google Scholar

Murthy, M.N. (1967). Sampling: Theory and methods. Statistical Pub. Society.

Indexed at, Google Scholar

Pandey, R., Thakur, N.S., & Yadav, K. (2015). Estimation of population mean using exponential ratio type imputation method under survey non-response. Journal of the Indian Statistical Association53(1), 89-107.

Google Scholar

Sande, I.G. (1979). A personal view of hot deck approach to automatic edit and imputation. Journal Imputation Procedures. Survey Methodology5, 238-246.

Google Scholar

Singh, B.K. & Nath K. (2018a). Estimation of population mean using ratio cum product compromised method of imputation in two-phase sampling scheme in sample survey. Asian Journal of Mathematics & Statistics, 11(1), 27-39.

Cross Ref

Singh, B.K. & Nath K. (2019). Generalized class of dual to product cum dual to ratio estimator for population mean with imputation of missing data in two-phase sampling scheme. International Journal of Mathematics and Statistics (IJMS), 21, 86-96.

Singh, B.K., & Gogoi, U. (2017). Estimation of population mean using exponential dual to ratio type compromised imputation for missing data in survey sampling. J Stat Appl Pro3, 515-522.

Indexed at, Google Scholar, Cross Ref

Singh, B.K., & Nath, K. (2018b). Some Imputation Methods in Two-Phase Sampling Scheme for Estimation of Population Mean. Research & Reviews: Journal of Statistics (RRJoST), 7(1), 1-16.

Google Scholar

Singh, S., & Horn, S. (2000). Compromised imputation in survey sampling. Metrika51(3), 267-276.

Indexed at, Google Scholar, Cross Ref

Wright, K., & Capps, C. (2011). A survey of information systems development project performance. Academy of Information and Management Sciences Journal14(1), 87-105.

Google Scholar

Received: 10-Feb-2022, Manuscript No. JMIDS-22-11218; Editor assigned: 15-Feb-2022, PreQC No. JMIDS-22-11218(PQ); Reviewed: 07-Mar-2022, QC No. JMIDS-22-11218; Revised: 29-Mar-2022, Manuscript No. JMIDS-22-11218 (R); Published: 05-Apr-2022

Get the App