Training computers to differentiate between people with the same name

All individuals are unique but millions of people share names. How to distinguish or as it is technically known, disambiguate people with common names and determine which John Smith or Maria Garcia or Wei Zhang or Omar Ali is a specific John Smith, Maria Garcia, Wei Zhang or Omar Ali or even someone previously unidentified?

Two computer scientists from the School of Science at Indiana University-Purdue University Indianapolis and a Purdue University doctoral student have developed a novel-machine learning method to provide better solutions to this perplexing problem. They report that the new method is an improvement on currently existing approaches of name disambiguation because the IUPUI method works on streaming data that enables the identification of previously encountered John Smiths, Maria Garcias, Wei Zhangs and Omar Alis.

“Bayesian Non-Exhaustive Classification. A Case Study: Online Name Disambiguation using Temporal Record Streams” by Baichuan Zhang, Murat Dundar and Mohammad al Hasan is published in Proceedings of the 25th International Conference on Information and Knowledge Management. Zhang is a Purdue graduate student.

Machine learning employs algorithms sets of steps to train computers to classify records belonging to different classes. Algorithms are developed to review data, to learn patterns or features from the data, and to enable the computer to learn a model that encodes the relationship between patterns and classes so that future records can be correctly classified. In the new study, for a given name value, computers were “trained” by using records of different individuals with that name to build a model that distinguishes between individuals with that name, even individuals about whom information had not been included in the training data previously provided to the computer.