In my previous post on data destruction, I implement a quick solution to encrypt SSN filtering on – REGEXP_LIKE(ssn,’^[[:digit:]]{9}$’).
As the result, even invalid SSN will be encrypted. It all depends on how accurate the process should be. However, keep in mind the above implementation will modify more rows than necessary.
Here is an improved version with test case.
Test Case:
create table t(ssn varchar2(40));
insert into t values(‘326217894’);
insert into t values(‘4522969912’);
insert into t values(‘000333333’);
insert into t values(‘111004444’);
insert into t values(‘004770000’);
insert into t values(‘023882391’);
insert into t values(‘3452371’);
insert into t values(‘123456’);
insert into t values(‘a123456’);
commit;
I found a solution from netmail but it would not work in Oracle:
(?!000\d{6}) Exclude any number that contains 000 followed by 6 digits
(?!\d{3}00\d{4}) Exclude any number that begins with 3 digits, followed by 00, followed by 4 digits
(?!\d{5}0000) Exclude any number that begins with 5 digits followed by 0000
(?!666\d{6}) Exclude any number that contains 666 followed by 6 digits
The ! (exclamation) would return ZERO rows in Oracle with SQL*Plus. The alternative is to use not regexp_like.
INVALID SSN:
SQL> select ssn,lpad(ssn,9,0) fmt_ssn from t where regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})')
and length(ssn) between 7 and 9; 2
SSN FMT_SSN
------------ ------------
000333333 000333333
111004444 111004444
004770000 004770000
9-DIGIT STRING:
SQL> select ssn,lpad(ssn,9,0) fmt_ssn from t where regexp_like(lpad(ssn,9,0),'^[0-8]{3}[0-9]{2}[0-9]{4}$')
and length(ssn) between 7 and 9; 2
SSN FMT_SSN
------------ ------------
326217894 326217894
000333333 000333333
111004444 111004444
004770000 004770000
023882391 023882391
3452371 003452371
ENCRYPT VALID SSN:
SQL> select ssn,lpad(ssn,9,0) fmt_ssn, DBMS_CRYPTO.hash(UTL_RAW.cast_to_raw(ssn),3) encrypt_ssn
from t
where not regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})')
and regexp_like(lpad(ssn,9,0),'^[0-8]{3}[0-9]{2}[0-9]{4}$')
and length(ssn) between 7 and 9; 2 3 4 5
SSN FMT_SSN ENCRYPT_SSN
------------ ------------ ----------------------------------------
326217894 326217894 681549D4B1597FF63530A6E8B9735241BBDACE56
023882391 023882391 DE30969505E26F6EA1849A4EA600619686977744
3452371 003452371 95A8011153DC2AF120E5C60CEB167C97809495E1
Tested on SQL*Plus: Release 11.2.0.2.0
References:
http://help.netmail.com/pages/viewpage.action?pagessn=6684926
http://www.ssa.gov/employer/randomization.html
http://www.ssa.gov/history/ssn/geocard.html
2012/OCT/16 UPDATE:
Comparison between regexp and substr
select /*+ PARALLEL(a,8) FULL(a) */count(*)
from t a
WHERE regexp_like(ssn,'^[0-8]{1,3}[0-9]{2}[0-9]{4}$') AND
(
length(ssn)= 9 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-9,3)<>'000' and substr(ssn,-9,3)<>'666' OR
length(ssn)= 8 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-8,2)<>'00' OR
length(ssn)= 7 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-7,1)<>'0'
);
COUNT(*)
----------
1634972
Elapsed: 00:00:04.60
select /*+ PARALLEL(a,8) FULL(a) */count(*)
from t a
where not regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})')
and regexp_like(ssn,'^[0-8]{1,3}[0-9]{2}[0-9]{4}$');
COUNT(*)
----------
1634972
Elapsed: 00:00:39.47