Dating Services and Bayesian Logic

This post is Part 2 of a series on Valuation Metrics Technology and the mathematics behind it.

Match Score Utility and Dating Services

The Valuation Metrics’ Match Scores are a lot like dating services. Just as dating services match single people together by finding common interests, Valuation Metrics matches funds and companies together that have similar metrics. Neither service is perfect – a dating service will sometimes say two people were made for each other when it turns out that they can’t stand one another (false positive). Similarly, our Match Scores will occasionally indicate a good fit between a fund and a company when that may not be the case. On an overall basis however, both services are extremely good at predicting, which is why dating sites are so popular, and why our clients find our targeting system so useful.

To appreciate the power of our match scoring algorithms, we need to first understand how to interpret our backtesting results. The results are best explained by a concept called Bayes’ Theorem, which is a simple mathematical formula used for calculating conditional probability (the probability of one event occurring given that another event has already happened). Though the concept is simple, Bayesian logic itself is somewhat counterintuitive.

Bayesian Logic – Dating Services

Consider the following observations a particular dating service made regarding its members:

  • (a)  1% of the 10,000 couples who met through the service ended up getting married
  • (b)  80% of the couples who got married shared similar interests
  • (c)  10% of the couples who didn’t get married shared similar interests

What is the probability that a couple who share similar interests gets married?

Most people reason that since 80% of all married couples share similar interests, this means that 80% of all couples with similar interests get married. This is an erroneous conclusion however, because it fails to take into account the base rate at which couples get married overall. To find the answer to what we’re looking for, known in statistical jargon as the posterior probability, we need to use all three pieces of information above: the base rate (the prior probability), the conditional probability, and the marginal probability.

  • prior probability = P(married) = (a) = 1%
  • conditional probability = P(similar interests\married) = (b) = 80%
  • marginal probability = P(similar interests) = (b)x(a) + (c)x[1-(a)] = (80%)x(1%) + (10%)x(99%) = 10.7%

Substituting “M” for “married” and “I” for “similar interests,” the posterior probability, which is P(married\same interests), can then be found using the equation for Bayes’ theorem:

P(M\I) = [P(I\M) x P(M)] / P(I) = (80%)x(1%)/(10.7%) = 7.5%

It is defined as the joint probability that a couple both a) got married (prior probability), and b) shared similar interests given that they were married (conditional probability), divided by the marginal probability that they had similar interests.

Bayes’ theorem enables us to determine how much original probabilities change as a result of new information. In this case, the original probability – that couples get married – was 1%. When new information was introduced – the inclusion of matching based on similar interests – the rate at which couples got married, given that they shared similar interests, went up to 7.5%.

We can get a better understanding of the numbers used in Bayes’ formula by looking at the number of couples in each respective probability group and then plotting that information on a Venn diagram.

Before considering Similar Interests:

  • Group 1: 1% of 10,000 = 100 couples got married
  • Group 2: 10,000 – 100 = 9,900 couples didn’t get married

After considering Similar Interests:

  • Group A: 80% of 100 = 80 couples got married and had similar interests
  • Group B: 100 – 80 = 20 couples got married but didn’t have similar interests (false negative)
  • Group C: 10% of 9,900 = 990 couples didn’t get married but had similar interests (false positive)
  • Group D: 9,900 – 990 = 8,910 couples didn’t get married and didn’t have similar interests

The proportion of couples who got married among those that shared similar interests is the proportion of Group A within Groups A+C:  80/(80+990) = 7.5%.  It can be interpreted visually by looking at the Venn diagram below:

Bayesian1

On the surface, the results for matching up couples based on whether or not they have similar interests may seem rather discouraging. After all, 92.5% of the time (990/1070) couples that had similar interests didn’t get married. Does matching up couples based on similar interests offer any real benefit in terms of predicting which couples will get married? Why do dating services even bother looking at similar interests if it is wrong so much of the time?

The seeming inconsistency lies in the low overall probability of couples getting married. Since so few couples get married (1%), the number of couples that don’t get married is extremely large (9,900), so that even a fairly low rate of false positives (10%) will produce a high rate of couples who don’t get married even though they had similar interests. This doesn’t at all mean that matching couples based on similar interests is worthless however.

The value of matching couples based on similar interests is determined by a statistical measure called an Impact Value. An Impact Value relates conditional probability to overall probability. Unless we consider the percentage of couples that get married overall, it is impossible to determine the value of matching couples based on similar interests. For instance, it is intuitively obvious that if couples with similar interests got married at the same rate as couples overall, then matching couples based on similar interests would be of no additional value. The fact that 7.5% of couples who share similar interests get married means nothing unless we relate it to the percentage of all couples who get married (1%).

The Impact Value is calculated by taking the ratio of the two percentages:

Impact Value = 7.5%/1% = 7.5

This means that a couple that has similar interests is 7.5 times more likely to get married than couples overall.

Expressed differently, the (80+990)/(10,000) = the 10.7% of couples with similar interests account for 80% of all marriages, so they got married at 80%/10.7% = 7.5 times the overall rate.

This is why dating services look at members’ interests and why they are so focused on couples who have similar interests.

Bayesian Logic – Match Score Backtesting

The Valuation Metrics’ match scoring algorithms can be thought of in much the same way. The numbers from our backtesting are very similar to those in the example above. Simply substitute the phrase “purchased” for “married” and “match score of 99%” for “had similar interests” in the backtesting results below and you can follow along point for point:

Before considering Match Scores:

  • Group 1: 560,452 companies were purchased (0.9% of the 61,483,236 total).
  • Group 2: 61,843,236 – 560,452 = 61,282,784 companies were not purchased.

After considering Match Scores in the 99% category:

  • Group A: 37,384 companies were purchased and had a match score of 99%.
  • Group B: 560,452 – 37,384 = 523,068 companies were purchased and had a match score below 99% (false negative).
  • Group C: 474,555 companies were not purchased but had a match score of 99% (false positive).
  • Group D: 61,282,784 – 474,555 = 60,808,229 companies were not purchased and had a match score below 99%.

The proportion of purchased companies among those with a match score of 99% is the proportion of Group A within Groups A+C:  37,384/(37,384+474,555) = 7.3%.

The results are illustrated in the Venn diagram below:

Bayesian2

Impact Value = 7.3%/0.9% = 8.1

  • A company with a match score of 99% is 8.1 times more likely to be purchased than companies overall.

Expressed differently, the (37,384+474,555)/(61,843,236) = 0.83% of companies with a match score of 99% account for 37,384/560,452 = 6.7% of all the purchases, so they were purchased at 6.7%/0.83% = 8.1 times the overall rate.

  • 6.7% of all purchases take place among the companies having a match score of 99%.

After considering Match Scores in the Very High category (>= 80%):

  • Group A: 345,289 companies were purchased and had a Very High match score.
  • Group B: 560,452 – 345,289 = 215,163 companies were purchased and had a match score below Very High (false negative).
  • Group C: 11,808,167 companies were not purchased and had a Very High match score (false positive).
  • Group D: 61,282,784 – 11,808,167 = 49,474,617 companies were not purchased and had a match score below Very High.

The proportion of purchased companies among those with a Very High match score is 345,289/(345,289+11,808,167) = 2.8%.

Impact Value = 2.8%/0.9% = 3.1

  • A company with a Very High match score is 3.1 times more likely to be purchased than companies overall.

Expressed differently, the (345,289+11,808,167)/(61,843,236) = 20% of companies with a Very High match score account for 345,289/(560,452) = 62% of all the purchases.

  • 62% of all purchases take place among the companies deemed a Very High match.
  • Only 38% of all purchases take place among companies deemed a less than Very High match. These companies have only a 215,163/(215,163+49,474,617) = 0.4% chance of being purchased.

After considering Match Score in the High and Very High categories (>= 60%):

  • Group A: 469,083 companies were purchased and had a High or Very High match score.
  • Group B: 560,452 – 469,083 = 91,369 companies were purchased and had a match score below High (false negative).
  • Group C: 24,511,875 companies were not purchased and had a High or Very High match score (false positive).
  • Group D: 61,282,784 – 24,511,875 = 36,770,909 companies were not purchased and had a match score below High.

The proportion of purchased companies among those with a High to VH match score is 469,083/(469,083+24,511,875) = 1.9%.

Impact Value = 1.9%/0.9% = 2.1

  • A company with a High or Very High match score is 2.1 times more likely to be purchased than companies overall.

Expressed differently, the (469,083+24,511,875)/(61,843,236) = 40% of companies with a High or Very High match score account for 469,083/560,452 = 84% of all the purchases.

  • 84% of all the purchases take place among companies deemed a High or Very High match.
  • Only 16% of all purchases take place among companies deemed less than a High match. These companies have only a 91,369/(91,369+36,770,909) = 0.2% chance of being purchased.

After considering Match Score that are Outliers (<= 20%):

  • Group A: 10,820 companies were purchased and were Outliers.
  • Group B: 560,452 – 10,820 = 549,632 companies were purchased were not Outliers.
  • Group C: 11,680,282 companies were not purchased and were Outliers.
  • Group D: 61,282,784 – 11,680,282 = 49,602,502 companies were not purchased and were not Outliers.

The proportion of purchased companies among the Outliers is 10,820/(10,820+11,680,282) = 0.1%.

Impact Value = 0.1%/0.9% = 0.1

  • A company that is an Outlier is one tenth as likely to be purchased as companies overall.

Expressed differently, the (10,820+11,680,282)/(61,843,236) = 19% of companies that are Outliers account for only 10,820/560,452 = 2% of all purchases.

  • Only 2% of all purchases take place among companies that are deemed Outliers.

Impact Value (Very High vs Outlier) = 2.8%/0.1% = 30

  • Companies classified as Very High matches are 30 times more likely to be purchased than Outliers.

The chart below summarizes the backtesting results presented above, and it breaks the data out by market capitalization (large caps get purchased at a higher rate than small caps, but a there is still a clear trend between match score categories):

Impact Values

With results like these, is it any wonder why this system is so powerful at bringing together companies and potential investors?

Just as dating services use interests to weed out couples who are extremely unlikely to get married, we suggest Investment Relations representatives not bother with the 60% of funds that fall into our lowest three match categories (Outlier, Low, Moderate), as these funds are very unlikely to purchase their company. Time is much better spent focusing on the companies in our High and Very High match categories, where 84% of all the buying takes place.