Longitudinal data

Steve Simon

2002-07-26

Dear Professor Mean, I have longitudinal data on the growth pattern of patients given growth hormone. How should I store the data? –Jittery Jerry

Dear Jittery,

You have two choices:

  1. A single record per patient, multiple variables
  2. Multiple records per patients, single variable

but a better choice may be to use a mixture of both types.

Examples of the two formats

Here is an example of a single record, multiple variable format


Name    Gender   Measure1   Measure2   Measure3   Measure4
Abby    Female     aaa        bbb        ccc        ddd
Dean     Male      ddd        eee        fff        ggg

Hilda Female hhh iii jjj kkk Nora Female nnn ooo ppp qqq Tucker Male ttt uuu vvv www


The single record, multiple variable format is short and wide. You will frequently scroll to the left and right with this format.


Name    Gender   Time   Measure
Abby    Female    1       aaa
Abby    Female    2       bbb
Abby    Female    3       ccc
Abby    Female    4       ddd
Dean     Male     1       ddd
Dean     Male     2       eee
Dean     Male     3       fff
Dean     Male     4       ggg

Hilda Female 1 hhh Hilda Female 2 iii Hilda Female 3 jjj Hilda Female 4 kkk Nora Female 1 nnn Nora Female 2 ooo Nora Female 3 ppp Nora Female 4 qqq Tucker Male 1 ttt Tucker Male 2 uuu Tucker Male 3 vvv Tucker Male 4 www


The multiple record, single variable format is tall and narrow. If you have a lot of repeated measurements, you will end up scrolling up and down a lot. Notice that there is a lot of repetition in this format.

Advantages of the single record, multiple variable format

Advantages of the multiple record, single variable format

In SPSS you can switch from either format to the other. Select Data | Restructure from the SPSS menu. The steps you follow depend heavily on the context of your particular data set, so an example here would not help that much. Sorry!

Time varying data and time constant data

For a very complex longitudinal study, you may find it easier to split the data into two tables. The first table will contain the time constant data. This is data that does not change for the duration of the study. Most demographic variables, like gender and race, are time constant.

The second table will contain the time varying data. This is data that changes over time. Physical measurements like weight change over time.

You may find that some of your data does not fit nicely in these two categories, and you have a choice how to handle this type of data. For example, you could store the age at each visit as time varying data, or you could just record the age at the first visit as a time constant data.

When you split the data, you need to have a key variable that allows you to link the two files together.

Here’s an example of the time constant data.


Id Name Gender 1 Abby Female 2 Dean Male 3 Hilda Female 4 Nora Female 5 Tucker Male


And here is the time varying data.


Id Time Measure 1 1 aaa 1 2 bbb 1 3 ccc 1 4 ddd 2 1 ddd 2 2 eee 2 3 fff 2 4 ggg 3 1 hhh 3 2 iii 3 3 jjj 3 4 kkk 4 1 nnn 4 2 ooo 4 3 ppp 4 4 qqq 5 1 ttt 5 2 uuu 5 3 vvv 5 4 www


Merging time constant data with time varying data

When you merge the time constant and time varying data together, you should inform SPSS that your time constant data is the “keyed table." You must have a key variable that links the two tables together The key variable has to have the same name and the same type in both tables. If your key variable is numeric in one table and string in another table, then you cannot merge the files together in SPSS. Finally, you have to make sure that both tables are sorted by the key variable.

It is simplest to start with the time constant data. Select Data | Merge Files | Add Variables from the SPSS menu.

In the Add Variables: Read File dialog box, you tell SPSS where to find the time varying data. Then click on the Open button.

SPSS will exclude any variable that has the same name in both data sets. The excluded variables in almost every case represent the key variable(s) that you use to link the two files together. Select the Match cases on key variables in sorted files option box and add id to the Key Variables field. Then select the Working Data File is keyed table option circle. If you had started instead with the time varying data, then you would choose the option circle just above instead.

After you are done, be sure to save your data using a different name. Otherwise, the merged data will be saved on top of the time constant data.

Pre-test/post-test study

The simplest longitudinal design is a pre-test/post-test study. In this design, you take a measurement, apply an intervention to some or all of your patients and then take another measurement. Your analysis will usually involve either the computation of a change score (post-test measurement minus the pre-test measurement) or the use of the pre-test measurement as a covariate. For both of these approaches, the single record, multiple variables format works best.

Summary

With longitudinal data, you have two possible formats for your data:

For complex studies it may make the most sense to split the data into two tables consisting of:

Be sure to include a key variable to link the two tables together.

Further reading

Stats: Merging files in SPSS

You can find an earlier version of this page on my original website.