Also, be aware that you will likely have to modify your svyset command to work with the merged data set. Also, before you start using the multiply imputed data, you should look at the help file for mim to see if the procedure that you want to use is supported by mim. We can use the estat size command to get the unweighted and weighted size i. Imputation flags are variables that are added to the imputed data sets to tell the user which cases have imputed values.

We can also use the over option to get estimates for all categories of the variable. You will want to know this so that you can assess how reliable estimates involving that variable are.

Survey data are different. We can use the mimstack command to do this search mimstack.

While it may be possible to get reasonably accurate results using non-survey software, there is no practical way to know beforehand how far off the results from non-survey software will be. Under many sampling plans, avg crack version the sum of the probability weights will equal the population total.

After stacking the data sets, I keep only a few variables, then I use the compress command to make the data set as small as possible. This will tell you how missing data were handled. These options are subpop and over.

Another important data management issue is how missing values are coded in your data set. Instead, Stata has provided two options that allow you to correctly analyze subpopulations of your survey data. The second thing that needs to be changed is at the very end of the file. You will get this information from the codebook. For example, you may be interested only females or only in people who call themselves white, or white females.

We will use proc freq to see how this variable is coded. You can change the reference level of the categorical variable by using the reflevel statement, as shown below. On the jackwgts statement we indicate which variables are the replicate weights. After that, we are ready to run the logistic regression. You will notice from the first two lines of the output that there were many thousands more observations read than were used in the analysis.

The formulas for using both if and subpop are given, along with an explanation of how they are different. Hence, you will see some words in blue and others in red. The degrees of freedom are given in the last table. We will show examples using both of these methods of variance estimation.

This should give you some output telling you about the data set. If multiple sampling weights have been included in the data set, there will be some instruction about when to use which one.

There are two things that you will need to change. The most common are balanced repeated and jackknife replicate weights. On the weight statement we indicate the pweight, sometimes called the final pweight.

You must use either a temporary data set, as we do here, or a libname. Many of the calculations change depending on if a sample is collected with or without replacement.

You can use options on the proc descript statement to add other statistics to this output, as well as adding an output or print statement. The double dash is used to indicate positionally consecutive variables in the data set. As mentioned before, there are still variables with missing data in the multiply imputed data sets. Although these three lines of code seem to be complex, once you have them correct, they do not need to be modified again. Many times researchers are interested only in a certain subpopulation, or group, of respondents.

We will look at seven categories of race. You can use categorical variables in your regression by using the subgroup and levels statements.

You can specify more than one variable on this statement if you need to set the reference values for more than one variable. All of your data management must be done in another package.

We will make a crosstab of gender and race. Instead, use the subpopn statement with the intact complete, whole data set. Next I add some value labels, and then I arrange the variables in alphabetical order with the aorder command.

In the third column in the table on the top, we see the results for the total number of cases involved in the analysis. You may also want to check periodically to see if there are any updates to mim or similar procedures. The alias is proc rlogist. Like the variable names, the number of levels are separated by a space. Then you give a run statement and you are finished!