Anonymizing ids

Steve Simon

2021-06-10

I’m working on a project that might be exempt from IRB review…if I can convert a particular ID code to an anonymous code. The ID codes are an eight digit number. Here’s a simple approach that should work for this project.

The steps are

  1. Create a random number between 100,000,000 and 900,000,000.

  2. Add the same random number to each eight digit id.

  3. Link the eight and nine digit ids to the raw data.

  4. Store the merged data with the nine digit id, but not the eight digit id.

  5. Shred the random number.

The process of shredding the random number will prevent anyone from converting the nine digit id back into an eight digit id.

1. Create a random number between 100,000,000 and 900,000,000.

There are several easy ways to create a random number with a specific range. In Microsoft Excel, you would place the formula

=randbetween(100000000, 900000000)

Count your zeros carefully, there should be eight of them for both numbers.

This will produce a value. Mine is 480035489, but yours will be different.

There is a complication in Microsoft Excel that you have to work around. The randbetween function will recalculate everytime you make a change anywhere else in the workbook. You need to “freeze” the random number by copying and pasting it to a different cell using the “past values” option under “paste special.”

2. Add the same random number to each eight digit id.

Here’s some data with some made up id values

private_id
12345678
23608461
34871244
46134027
57396810
68659593
79922376
91185159

Here’s what those ID numbers look like when you add my random number. We now have a random nine digit id associated with each eight digit id.

private_id random_id
12345678 492381167
23608461 503643950
34871244 514906733
46134027 526169516
57396810 537432299
68659593 548695082
79922376 559957865
91185159 571220648

4. Store the merged data with the nine digit id, but not the eight digit id.

You now have a new random_id in place of the private id.

random_id report_date data
2 492381167 2024-02-09 144
3 492381167 2024-04-03 115
4 492381167 2024-03-16 152
5 492381167 2024-04-04 100
6 503643950 2024-02-27 158
7 503643950 2024-02-08 185
8 503643950 2024-02-27 175
9 514906733 2024-03-07 102
1

5. Shred the random number.

Once the random number is destroyed, the nine digit id cannot be converted back to the eight digit id.