Parcel ID parser – March 17, 2008

I don't know if it is only relevant to Allegheny County or if it is more universal, but a friend asked me to write a parcel id (parcel/block/lot) parser to help him out in his real estate searches.  It was kind of a fun project (3 hours) and it was neat to see how well it worked out in the end.

It takes stuff on the right, and turns it into the stuff on the left:

0318-C-00080-0000-00 <= 318 C 080
0387-S-00002-0000-00 <= 387-S-2  387-M-148
0160-K-00013-0000-00 <= 0160-K-00013-0000-00
0124-P-00095-000A-00 <= 0124-p-00095-000a-00
1213-F-00377-0000-00 <= 1213F00377
0180-B-00041-0000-00 <= 0180-B-00041-0000-00
0495-F-00201-0000-00 <= Lot & Block 495-F-201
0009-S-00305-0000-00 <= 9-S-305
0309-D-00100-0000-00 <= 0309D00100000000

Code follows below:

#!/usr/bin/php5
<?php
// output field separator, you can set this to "" if you want
// the numbers to be smushed together in the output
define(OSEP, "-");
// If run as a command line program, you can run it like:
// ./property.php < filename.txt
// Otherwise, comment out this while loop and just call
// the function below
while($line = fgets(STDIN)){
$match = guessNumber($line);
if($match)
print "$match <= $line\n";
else
print "Invalid lot format: $line\n";
}
// guessNumber is the main entry point, you can pass in
// all sorts of messed up text and it will try to figure it out
// Anything after a tab character will be ignored.
function guessNumber($line){
// strip out stuff after the tab character
$line = substr($line, 0, strpos($line, "\t"));
foreach(array("-", "", " ") as $isep){
$match = guessNumberHelper($line, $isep);
if($match)
return $match;
}
$isep = "[X_-]+";
$match = guessNumberHelper($line, $isep);
if($match)
return $match;
$isep = "[ ]+";
$match = guessNumberHelper($line, $isep);
if($match)
return $match;
$isep = "[ X_-]+";
$match = guessNumberHelper($line, $isep);
if($match)
return $match;
return false;
}
function guessNumberHelper($num, $sep){
// regular expression for matching
$pattern = "([0-9][0-9]?[0-9]?[0-9]?)".
$sep."([A-Z])".
"((".$sep.")?([0-9][0-9]?[0-9]?[0-9]?[0-9]?)".
"((".$sep.")?([0-9A-Z][0-9A-Z]?[0-9A-Z]?[0-9A-Z]?)".
"((".$sep.")?([0-9A-Z][0-9A-Z]?))?)?)?";
$num = trim($num);
$num = strtoupper($num);
$num = str_replace(array("*","="), "", $num);
if (ereg($pattern, $num, $regs))
{
$str .= padZeros($regs[1], 4);
if(!$regs[2]){
$regs[2] = 0;
}
$str .= OSEP.$regs[2];
if(!$regs[5]){
$regs[5] = 0;
return false;
}
$str .= OSEP.padZeros($regs[5], 5);
$str .= OSEP.padZeros($regs[8], 4);
$str .= OSEP.padZeros($regs[11], 2);
// check for mostly empty entries
if(strlen(str_replace(array("0","-"), "", $str)) < 3){
return false;
}
return $str;
} else {
return false;
}
}
function padZeros($val, $length){
$ret = "";
for($len = strlen($val); $len < $length; $len++)
$ret .= "0";
$ret .= $val;
return $ret;
}