This is a difficult read (especially as the code is not authored with clarity in mind) but it's a really interesting topic. At issue is how you equate data elements that are only partially matched. For example, human readers have no problem knowing that the string "S. Korea" and the string "South Korea" refer to the same country. But to a computer, this is a difficult problem. This post describes one algorithm for matching these sorts of pairs. You might think, it's just country names, do it by hand. But gRSShopper extracts author data from posts. Are "Clayton Wright" and "C.R. Wright" the same person? I have 8617 author records; I can't do it by hand. So - a difficult but significant problem.
Today: 1 Total: 102 [Share]
] [