Thinking Out Loud

October 16, 2012

Using Oracle regexp_like to encrypt valid SSN

Filed under: 11g,oracle — mdinh @ 5:41 am

In my previous post on data destruction, I implement a quick solution to encrypt SSN filtering on  – REGEXP_LIKE(ssn,’^[[:digit:]]{9}$’).

As the result, even invalid SSN will be encrypted. It all depends on how accurate the process should be. However, keep in mind the above implementation will modify more rows than necessary.

Here is an improved version with test case.

Test Case:
create table t(ssn varchar2(40));
insert into t values(‘326217894′);
insert into t values(‘4522969912′);
insert into t values(‘000333333′);
insert into t values(‘111004444′);
insert into t values(‘004770000′);
insert into t values(‘023882391′);
insert into t values(‘3452371′);
insert into t values(‘123456′);
insert into t values(‘a123456′);
commit;

I found a solution from netmail but it would not work in Oracle:

(?!000\d{6})      Exclude any number that contains 000 followed by 6 digits
(?!\d{3}00\d{4})  Exclude any number that begins with 3 digits, followed by 00, followed by 4 digits
(?!\d{5}0000)     Exclude any number that begins with 5 digits followed by 0000
(?!666\d{6})      Exclude any number that contains 666 followed by 6 digits

The ! (exclamation) would return ZERO rows in Oracle with SQL*Plus. The alternative is to use not regexp_like.

INVALID SSN:

SQL> select ssn,lpad(ssn,9,0) fmt_ssn from t where regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})') 
and length(ssn) between 7 and 9;  2  

SSN          FMT_SSN
------------ ------------
000333333    000333333
111004444    111004444
004770000    004770000

9-DIGIT STRING:

SQL> select ssn,lpad(ssn,9,0) fmt_ssn from t where regexp_like(lpad(ssn,9,0),'^[0-8]{3}[0-9]{2}[0-9]{4}$') 
and length(ssn) between 7 and 9;  2  

SSN          FMT_SSN
------------ ------------
326217894    326217894
000333333    000333333
111004444    111004444
004770000    004770000
023882391    023882391
3452371      003452371

ENCRYPT VALID SSN:

SQL> select ssn,lpad(ssn,9,0) fmt_ssn, DBMS_CRYPTO.hash(UTL_RAW.cast_to_raw(ssn),3) encrypt_ssn
from t 
where not regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})')
and regexp_like(lpad(ssn,9,0),'^[0-8]{3}[0-9]{2}[0-9]{4}$')
and length(ssn) between 7 and 9;  2    3    4    5  

SSN          FMT_SSN      ENCRYPT_SSN
------------ ------------ ----------------------------------------
326217894    326217894    681549D4B1597FF63530A6E8B9735241BBDACE56
023882391    023882391    DE30969505E26F6EA1849A4EA600619686977744
3452371      003452371    95A8011153DC2AF120E5C60CEB167C97809495E1

Tested on SQL*Plus: Release 11.2.0.2.0

References:

http://help.netmail.com/pages/viewpage.action?pagessn=6684926

http://www.ssa.gov/employer/randomization.html

http://www.ssa.gov/history/ssn/geocard.html

2012/OCT/16 UPDATE:

Comparison between regexp and substr

select /*+ PARALLEL(a,8) FULL(a) */count(*)
from t a
WHERE regexp_like(ssn,'^[0-8]{1,3}[0-9]{2}[0-9]{4}$') AND
(
length(ssn)= 9 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-9,3)<>'000' and substr(ssn,-9,3)<>'666' OR
length(ssn)= 8 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-8,2)<>'00'  OR
length(ssn)= 7 and substr(ssn,-4)<>'0000' and substr(ssn,-6,2)<>'00' and substr(ssn,-7,1)<>'0'
);

  COUNT(*)
----------
   1634972

Elapsed: 00:00:04.60

select /*+ PARALLEL(a,8) FULL(a) */count(*)
from t a
where not regexp_like(lpad(ssn,9,0),'(?000\d{6})|(?\d{3}00\d{4})|(?\d{5}0000)|(?666\d{6})')
and regexp_like(ssn,'^[0-8]{1,3}[0-9]{2}[0-9]{4}$');

  COUNT(*)
----------
   1634972

Elapsed: 00:00:39.47

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 300 other followers

%d bloggers like this: