Matching Methods:
There are three matching methods are available in Data Services match transform for setting up the match criteria.
1. Rule Based method
2. Weighted Scoring method
3. Combination method
Match Method | Description |
Rule Based | Allows controlling which criteria determines the match. |
Weighted Scoring | Allows assigning importance or weight to any criteria. |
Combination Method | Combines the rule-based and weighted scoring methods of matching. |
1. Rule Based Method:
With rule-based matching, we rely only on our match and no-match scores to determine matches within the criteria.
The following example shows how to set up this method in the Match transform.
Criteria | Record A | Record B | No Match Score | Match Score | Similarity Score |
First Name | Mary | Mary | 82 | 101 | 100 |
Last Name | Smith | Smitt | 74 | 101 | 80 |
E mail | 79 | 80 | 91 |
By entering a value of 101 in the Match score for every criterion except the last, the First Name and Last Name criteria never determine a match
By setting the Match and No-Match score for the E-mail criteria with no gap, any comparison that reaches the last criteria must either be a match or a no-match.
A match score of 101 ensures that the criterion does not cause the records to be a match, because two fields cannot be more than 100 percent alike.
In rule-based match scenarios, we should avoid gaps between the Match Score and No Match Score.
In the output file under “Match Type” we can find values D and R.
2. Weighted Scoring Method:
In a rule-based matching method, the application gives all of the criteria the same amount of importance (or weight). That is, if any criterion fails to meet the specified Match-Score, the application determines that the records do not match. When we use the weighted scoring method, we are relying on the total contribution score for determining matches, as opposed to using match and no match scores on their own.
Contribution Values:
Contribution values are our way of assigning weight to individual criteria. The higher the value, the more weight that criterion carries in determining matches. In general, criteria that might carry more weight than others include account numbers, Social Security numbers, customer numbers, Postcode1, and addresses.
Total 100. All contribution values for all criteria that have them must total 100. We do not need to have a contribution value for all of our criteria.
We can define a criteria’s contribution value in the Contribution-To-Weighted-Score option in the Criteria-Definition option group.
Contribution and total contribution score
The Match transform generates the contribution score for each criteria by multiplying the contribution value we assign with the similarity score (the percentage alike). These individual contribution scores are then added to get the Total contribution score.
Weighted Match Score
In the weighted scoring method, matches are determined only by comparing the total contribution score with the weighted match score. If the total contribution score is equal to or greater than the weighted match score, the records are considered a match. If the total weighted score is less than the weighted match score, the records are considered a no-match.
We can set the weighted match score in the Weighted Match Score option of the Criteria Match Spec option group.
Weighted scoring example
The following table is an example of how to set up weighted scoring. Notice the various types of scores that we have discussed. Also notice the following
· When setting up weighted scoring, the No Match Score must be set to -1, and the Match Score must be set to 101. These values ensure that neither a match nor a no-match can be found by using these scores
· We have assigned a contribution value to the E-mail criteria that give it the most importance.
Criteria | Record A | Record B | No Match | Match | Similarity | Contribution Value | Contribution Score |
First Name | Mary | Mary | -1 | 101 | 100 | 25 | 25 |
Last Name | Smith | Smitt | -1 | 101 | 80 | 25 | 20 |
-1 | 101 | 91 | 50 | 46 | |||
Total Contribution | 91 |
In this example, the total contribution score is 91. If the weighted match score is 90 or less, the records are considered a no-match.
In the output file under “Match Type” we can find values D and W.
3. Combination Method:
This strategy combines the rule-based and weighted scoring methods of matching.
· A no-match can be determined by the similarity score of any criteria not equaling or exceeding the no-match score. However, a match cannot be determined by the match score (we must have a match score of 101).
· A match can be determined only by comparing a total contribution score with the weighted match score.
Criteria | Record A | Record B | No Match | Match | Similarity | Contribution Value | Contribution Score |
First Name | Mary | Mary | 59 | 101 | 100 | 25 | 25 |
Last Name | Smith | Hope | 59 | 101 | 22 | N/A | N/A |
49 | 101 | N/A | N/A | N/A | |||
Total Contribution | N/A |