Simulation Studies of
Social Security Number Validation
[Download the report, |
Introduction All banks in the United States must have Bank Secrecy Act/Anti-Money Laundering (BSA/AML) compliance procedures. These procedures include a Customer Identification Program (CIP) requiring a taxpayer ID. Yet private institutions have no means to directly verify with the Social Security Administration (SSA) the issuance of a Social Security Number to a specific, living person outside the context of employment or without signing a remarkably stringent agreement with SSA, specifying record retention policies and compliance reviews. Fortunately, a simpler, effective solution does exist. Social Security Numbers (SSNs) can be validated against a list or table of numbers that were actually issued on a date that is consistent with a person's age. (See our related page on Fraud Detection.) This simple technique of Social Security Number validation is highly recommended for Customer Identification Programs, particularly at the time of account inception. On a per-transaction basis, Social Security Number validation is remarkably cheap because it requires nothing more than a computer-driven table-lookup. Under several reasonable models of fraudulent use, it is also remarkably effective when combined with data for age or birth-year and an accurate SSN issuance history. Methods and Materials We used Monte Carlo simulations in which Social Security Numbers, ages, and birth-year were randomly generated under three different domains or protocols. Each of these protocols might be reasonably assumed to mimic a process by which such numbers, ages, and dates are fraudulently constructed. These "fraudulent" data were then validated against a table of all area and group numbers known to have been issued and their dates of issuance. The combination of SSN and birth-year was counted as "detected" if the number is known never to have been issued. Detections were also counted for numbers known to have been validly issued, but which were only issued before the claimed year of birth. This method takes advantage of the fact that, while Social Security Numbers can be issued to persons at any age of life, these numbers can only be issued to persons actually alive on the dates of issuance. The historical issuance data used in these simulations is based on proprietary information that extends to 1936, with annual data for issuance beginning in 1952. On a few occasions, these data have proven to be more reliable than those published by the Social Security Administration (Reference 1). Under Protocol 1, one thousand area and group numbers were chosen at random, but without duplication, using all digits zero through nine in every position. In addition an age was randomly generated between 0 and 100, which also yields a year of birth. For Protocol 2, these ages were randomly generated in such a way as to to closely match the age distribution of the estimated resident population of the United States on November 1, 2006 (Reference 2). In Protocol 3, randomly generated ages were chosen to closely match the age distribution (15 through 75) of the male population of the United States counted in the decennial census in prisons on April 1, 2000 (Reference 3). All of the simulations were performed in early 2008. Results Table 1 shows the detection rates achieved for the ten simulations performed in 2008 under Protocol 1. The average detection rate of 70% is markedly better than what could be achieved without the histories of the dates of issuance of each area and group number - about 45%. The lower detection rate is based on the fact that 44,586 valid combinations are known to have been issued through January 2, 2008 (Reference 4).
The detection rates of the ten simulations performed according to Protocol 2 are shown in Table 2. The average detection rate in these simulations is 75%. All of these simulations resulted in a detection rate equal to or better than would have been the case if the ages 0 through 100 had been chosen uniformly at random, rather than (randomly) to match the population age structure. This is because the actual, resident U.S. population does not have the same numbers of persons in every single age and because the average age of the population is 36.
Table 3 presents the results of the ten simulations run under Protocol 3. In this case, the youngest population structure among the three simulation domains (estimated average age, 33) results in the best rates of detection. The average detection rate for Protocol 3 is 78%.
Figure 1 summarizes the detection rates from each simulation under each of the three protocols. The reference line in the figure shows the expected detection rate that would be achieved by simple number validation, without using information for SSN issuance histories. As the figure shows, the issuance histories greatly improve the detection rate.
Discussion It is unlikely that very many perpetrators of fraud with Social Security Numbers are using random number generators to construct the numbers. According to the Social Security Administration, the most abused SSN in history (078-05-1120) was claimed inappropriately by more than 40,000 persons using a number that was not randomly chosen (Reference 5). Certainly some perpetrators depend on the use of SSNs they know to have actually been issued and that are still valid. Detection methods other than simple validation are necessary to find these. Even so, we know from the results of our own use of Social Security Number validation that many perpetrators do simply make up the numbers they use for fraudulent purposes. While some of these fraudulent numbers have a lower detection probability than the random numbers generated for this study, we know from our twenty years of practice in this field that many perpetrators do not succeed nearly as well and are easy to catch. Conclusion Social Security Number validation based on known dates of issuance is both inexpensive and effective. Because of this cost-benefit ratio, SSN validation should be routinely used for Customer Identification Programs. References
1. Alice K. Whitfield, "HighGroup Listing of SSNs," July 18, 2003, The Risks Digest, Volume 22: Issue 81, Uniform Resource Locator:
2. U. S. Census Bureau, Monthly Postcensal Resident Population for the 2000s, Uniform Resource Locator:
3. U. S. Census Bureau, Population in Group Quarters by Type, Sex and Age, for the United States: 1990 and 2000 (PHC-T-26), Uniform Resource Locator:
4. Social Security Administration, Social Security Number Verification Service (SSNVS), Uniform Resource Locator:
5. Social Security Administration, Social Security Cards Issued by Woolworth Email inquiries or questions: |
Link to: |
|
Copyright © Quality Control Systems Corp. All rights reserved. |