
#MERGE STATA CODE#
Observations with the code 3 were successfully matched. Observations with the code 2 were only found in the new data. Observations with the code 1 were only found in the original data.

The merge command creates a new variable called _merge. Luckily, Stata allows us to easily identify the problematic observations. Often this is caused by errors in the ID variable. We will then of course not be able to match African countries with the EU data.īut here we might have some problems, since there are countries in both datasets that could not find a match. This does not necessarily have to be a problem, for instance if we match data on EU countries with data from all of the world. We also have 38 observations in the "using" dataset, the democracy data, that could not find a match. But we can also see that there were 50 observations that could not be matched: 12 from the "master" dataset, that is, the one we had active (the corruption data). The summary shows that 168 observations were successfully matched. In this case, we want a 1:1 merge, with the democracy data. The principle is that we write merge, then what type of merge we want to do, then what the ID variable is, and then which dataset we want to match the active data with. Check the results and deal with errors.Enter the code to merge the datasets, matched on the ID variable.In this case we have loaded the corruption data. Make sure that the ID variable has the same name in both datasets.Sweden 2008 should be matched to Sweden 2008, and so on.

For instance, when working with this type of data we often want to match on both country and year. It reduces the risk that the merge is hindered by spelling errors and so on. Generally, it is preferable if the ID is a number, and not a string (text) variable. If we don't have a ID variable of this type it is impossible to do the merge. We want to match the level of democracy in Afghanistan with the level of corruption in Afghanistan. In this case it is the country name country.

All persons in Sweden will have the same value - Sweden is a part of the EU. Imagine that we have conducted a survey, and now want to see whether the answers are affected by whether the country of residence is an EU member or not. For instance, we might have data on persons in Europe, and want to add country-level information. It means that we have many observations in the dataset that is active in Stata, and want to add on data from a smaller number of observations, at a higher level. We then need a matching variable, a key, that shows the country's or the person's identity, so we can match the information in the two datasets.īut we can also match m:1. For instance the same individuals, or the same countries we just want to add more information about them. It means that we have the same observations in both datasets. With the merge command we can do three (or really, two) different types of merges. But since merge is a bit trickier we will focus on that command in this guide. In those cases, we use the command append, which really is the same as pasting more observations to the dataset. We do this with the command merge.Īnother scenario is when we have different datasets with the same variables, but with different units of analysis. We thus have the same observations in both datasets, but want to combine the variables. We might get information about the independent variable from one source, and want to analyze the effects of it on a dependent variable from another source.

Many statistical analyses requires the that we comvbine several datasets.
