Firstly, let’s consider what we mean by a company name, as CRM systems and other database structures often use the terminology of company name but in reality it’s a catch all phrase that relates to any organisational body that we are likely to do business with.
Sometimes other field descriptions are used such as Organisation Name, Entity Name, Business Name etc..
But does this really matter. Well if we are talking about matching and thinking in accordance with good data quality management then it does.
For instance a Company name is normally made of 2 parts, the unique recognisable name of the business and the type of legal entity that the business trades under.
i.e. Microsoft Corporation / Tesco PLC / American Airlines Group Inc
On occasion the company name will also provide additional entity type information such as Group, Bank, Holding Company etc..
Some organisation names will not include a legal identity element such as a Public Sector body, or a non-limited sole trader business.
Does this matter?
It’s quite typical that Matching solutions will either request you to provide a list of business entity types or will have this knowledge embedded with their technology and will look to strip this information out of the business name before any matching takes place.
For example: ‘Bank of America Corporation’ would become Bank of America.
This type of data preparation is designed to help the matching engine make find the most appropriate matches by not focusing on generic keywords and terms that will appear many times within the data.
However, removing the entity type is not always helpful. Sometimes, especially for larger organisations you will find that they have many subsidiaries which are of different legal structures and are registered as separate businesses on their own behalf. Therefore the preferred match would be the part of the organisation with the same legal structure.
Other types of business name entities.
Using algorithms, artificial intelligence and logic specifically geared to the data in question will help to achieve the best possible results and simply the matching process.
For example if you are selling into the Hotel sector then using a data description of Hotel Name, rather than a generic company name would be more beneficial. With the logic / matching processes focused on identifying patterns, abbreviations and acronyms specifically focused to the Hotel sector. The matching engine should be tuned to understand the nuances of the Hotel name, recognising that words and phrases like Hotel and B&B are going to be of less importance than recognised Hotel Chain names like Hilton and Best Western.
The more specific the matching solution is for a particular type of entity name the more likely the better results you can achieve.
Abbreviations and Acronyms
Another common issue to consider is abbreviations and acronyms.
Its common place for business entities to use Acronyms alongside their company name, there are many examples of these, for example HP / Hewlett Packard, BT/British Telecom, GM/General Motors.
Additionally we also have common terms that get abbreviated order to make data entry simpler, with ampersand’s (‘&’) used instead of ‘and’ and ltd to mean ‘limited’. ‘Corp’ for ‘Corporation’ etc.
So we also need to have an extensive library of knowledge that can be called upon to help match identify and approve matches.
However, it is not always straightforward as often there are exceptions o every rule.
For example the term Limited can be used to refer to a limited liability company, but it could also be part of the organisation name, for instance the business ‘Limited Brands Inc’. So removing the term limited for this instance would result in poor matching.
We also have other complexities to deal with as the same abbreviation can have multiple meanings, and this can vary by country.
For instance BT would likely mean British Telecom when looking at UK data but in the Australia we have the company ‘BT Lawyers PTY LTD’, just one of many potential issues.
So we need intelligent application of this extensive knowledge, one rule does not fit every scenario.
Non Exact Matching Company Name Matching
So the easy part of matching company names is generally exact matching, if the data is the same then it’s a match, but where it gets more complicated is when we have to make decisions on non-exact matches.
This is when we start to explore the use of fuzzy logic, computational algorithms, probabilistic reasoning and other artificial intelligent technologies and machine learning.
However, with Fuzzy logic you solve one problem and create another. You see the fuzzy logic can generate likely match candidates and give them a score of likelihood, but eventually you have to make a decision, are they a match or not.
Process wise you either make that decision systematically on behalf of the users or you ask the user to make the decision themselves.
If you ask the user to make the decision you typically add a large amount of work / time and hence cost to your matching project, if you make the decisions automatically then you have to choose a threshold or serious of thresholds to base your decision on and find a compromise you are happy with balancing the number of matches with the trustworthiness of the matches.
It is always the challenge of matching to balance quality with quantity, usually determined by cost effectiveness.
With having users validating you’re more complex matching, and capturing there input you can provide essential input into machine learning processes, and help tip the balance of the quality / quantity seesaw in favour of the quality.
Noise Words in the Company Name
Another important consideration when matching company names is noise words. It is not uncommon for company names to include many noise words that can be a distraction for the matching engine.
For example European Headquarters of Ford Motor Company.
Typical matching engines create a match key of a company name, often using the first 16 or so characters of the company name alongside some fuzzy logic processes to remove duplicate letters and using some phonetic based logic, to help speed up the process of finding good matches, but when noise words are included then this key approach is of limited use.
Our approach is very different, and our logical processes assess a company name and determine which elements are of most importance, helping us to find the most appropriate matches.
Trading Names / Former Names and other considerations
Many CRM systems and other databases will allow for business names to exist with additional insight, for instance in the D&B worldbase we see Former Name and upto 5 different trading names that a business trades under. It is not uncommon for a business really to be only known to people via its trading name and people being unaware of the actual legal name the business is registered under. For example franchise business will often trade under a recognised brand name, but will be owned and operated by a different legal entity.
Former Names can also be very helpful when matching as your data may be more up to date than the data you are matching against and therefore by having the former name available can help match more data, more easily.
Often matching systems will overlook this as it adds many more steps into the matching process and can be very time consuming without the necessary computing power to process this quickly.
When you are looking to match a company name, then often you can make the process easier if you have additional information that can be used to help find the best matches.
By using an address with your company name, you can help screen out lots of similar company names that are unlikely to be the same.
Website can also be helpful in this way as can telephone numbers, fax numbers, Long/Lat coordinates, postal codes, countries etc..
The more quality helpful information you can provide the matching engine generally the better results you will get.
Entity matching for company names has many considerations, we hope that this information will prove itself useful to you when choosing or building a solution for company name matching.
Match2Lists provides one of the most comprehensive and powerful matching solutions available today, using state of the art in memory parallel processing technology and an industry leading visual interface.