Quy tắc xác định số lượng mẫu nghiên cứu phù hợp

Enough data is needed to provide reliable estimates of the correlations. Use at least 50 cases and at least 10 to 20 as many cases as there are independent variables (IVs) (as the number of IVs increases, more inferential tests are being conducted (if testing each predictor), therefore more data is needed), otherwise the estimates of the regression line are probably unstable and are unlikely to replicate if the study is repeated.

Green (2001) and Tabachnick and Fidell (2012, p. 123) suggest:

50 + 8(k) for testing an overall regression model and
104 + kwhen testing individual predictors (where k is the number of IVs)
These sample size suggestions are based on detecting a medium effect size (β >= .20), with critical α <= .05, with power of 80%.
To be more accurate, study-specific power and sample size calculations should be conducted (e.g. use A-priori sample Size calculator for multiple regression; note that this calculator uses f²for the anticipated effect size – see the Formulas link for how to convert R² to to f²).

“Regarding the sample size question, the researcher generally would not factor analyze a sample of fewer than 50 observations, and preferably the sample size should be 100 or larger. As a general rule, the minimum is to have at least five times as many observations as the number of variables to be analyzed, and the more acceptable sample size would have a 10:1 ratio” (Hair et al., 2014, p.100).

“Simple regression can be effective with a sample size of 20, but maintaining power at .80 in multiple regression requires a minimum sample of 50 and preferably 100 observations for most research situations. The minimum ratio of observations to variables is 5:1, but the preferred ratio is 15:1 or 20:1, which should increase when stepwise estimation is used” (Hair et al., 2014, p.172)

“Green (1991) provides a thorough discussion of these issues and some procedures to help decide how many cases are necessary. Some simple rules of thumb are N ≥ 50 + 8*m (where m is the number of IVs) for testing the multiple correlation and N ≥ 104 + m for testing individual predictors” (Tabachnick and Fidell, 2012, p. 123).

“The results did not support the use of rules-of-thumb that simply specify some constant (e.g., 100 subjects) as the minimum number of subjects or a minimum ratio of number of subjects (N) to number of predictors (m). Some support was obtained for a rule-of-thumb that N ≥ 50 + 8*m for the multiple correlation and N ≥ 104 + m for the partial correlation. However, the rule-of-thumb for the multiple correlation yields values too large for N when rn ≥ 7, and both rules-of-thumb assume all studies have a medium-size relationship between criterion and predictors” (Green, 1991, p. 499).

Quá trình phân tích định lượng đòi hỏi mẫu nghiên cứu (sample size) phải đạt số lượng tối thiểu để nghiên cứu đạt được độ tin cậy. Kích thước của mẫu áp dụng trong nghiên cứu được dựa theo yêu cầu của phân tích nhân tố khám phá EFA (Exploratory Factor Analysis) và hồi quy đa biến. Có hai công thức phổ biển bắt buộc phải thực hiện:

– Công thức 1:Đối với phân tích nhân tố khám phá EFA: kích thước mẫu tối thiểu là gấp 5 lần tổng số biến quan sát, đây là cỡ mẫu phù hợp cho nghiên cứu có sử dụng phân tích nhân tố (Comrey, 1973; Roger, 2006).

n=5*m

với m là số lượng câu hỏi trong bài.

– Công thức 2: Đối với phân tích hồi quy đa biến: cỡ mẫu tối thiểu cần đạt được tính theo công thức là n=50 + 8*m, với m là số biến độc lập (Tabachnick và Fidell, 2012, trang 123). Lưu ý m là số lượng thành tố độc lập, chứ không phải là số câu hỏi độc lập trong bảng hỏi.

Một cách tối ưu, khi lựa chọn số lượng mẫu phải thỏa cả hai công thức trên, và nguyên tắc là thà dư còn hơn thiếu mẫu

Tài liệu tham khảo

Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26, 499‐510.

Tabachnick, B. G., & Fidell, L. S. (2012). Using multivariate statistics (6rd ed.). Pearson.

Hair Joseph F., Black William C., Babin Barry J., Anderson Rolph E. (2014), Multivariate data analysis. 7th Edition, Harlow: Pearson Education Limited.

Comrey A. L. (1973). A first course in factor analysis. New York: Academic.

Roger Bove (2006). Estimation and Sample Size Determination for Finite Populations. 10^th edition, CD Rom Topics, Section 8.7, West Chester University of Pennsylvania.