Fraud and Deception Detection: Five Language Fingerprints
Last month, I described how computer-aided text-based analysis can help uncover fraud and deception in company communications. But what other insights can we glean from this research into scandal companies?
We used Deception And Truth Analysis (D.A.T.A.) to examine 10 of the largest corporate scandals in recent history and found that the average lead time between our textual identification of deception and the public recognition of possible scandal was more than six years.
Corporate Scandals: Time between Textual Evidence and Public Recognition
Ticker | Company | Size, in US Millions | Scandal Year | Average Alert Score in Lead-Up | Average Alert Score Pre-Scandal | Years Warning |
ACC | Adelphia | $2,300 | 2002 | -46% | -44.8% | 2 |
AIG | AIG | $3,900 | 2005 | -30.6% | -52.4% | 12 |
CUC | Cendant | $640 | 1998 | -37.9% | -48.8% | 3 |
ENRN | Enron | $74,000 | 2001 | -87.4% | -76.3% | 8 |
HLS | HealthSouth | $1,400 | 2003 | -42.2 | -27.1% | 9 |
LEH | Lehman Bros. | $50,000 | 2008 | -37.2% | -3.8% | 13 |
SAY | Satyam | $1,400 | 2009 | -28.9% | -38.4% | 6 |
TYC | Tyco International | $600 | 2002 | -77.1% | -81.7% | 7 |
WCOM | WorldCom | $3,800 | 2001 | -33.9% | -47.9% | 4 |
WM | Waste Management | $6,000 | 1997 | -39.4% | -41.1% | 2 |
Total | $144,290 | Average | -40.3% | 6.6 |
The obvious question is why. Why does it take regulators and markets so long to recognize these scandals? And a follow-up question: What insights from text-based analysis can we use to better identify these scandals earlier? Let’s take these in turn.
Theory: It’s the Behavior
Why does D.A.T.A. detect deception faster than acutely interested investors and regulators? After thinking about this for a while, we developed a theory, and it boils down to 86.5%. That is the percentage of financial information that is expressed in text, not in numbers, in annual reports. Text communications reveal the behavior of corporate management teams, and that behavior leads to the outcome that is expressed in numerical performance.
So that 6.6 years between the initial indication of deception and when the scandal breaks is the average length of time that a poorly behaving firm can fake it, until they just can’t massage the numbers any longer.
What is interesting is that the two scandals that took over a decade to recognize both involved financial companies: AIG and Lehman Brothers. Their annual reports ran in the hundreds of pages, and the velocity of money cycling through their balance sheets and income and cash flow statements was very, very high. Thus, it took considerable time for their poor behaviors and choices — the inputs — to eventually show up in the numbers, or the outputs.
If this theory is a valid explanation for that lead time, then scandal ought to have language fingerprints that investors can dust for as either an early warning system or as a second opinion on the normal fundamental work that investment research teams conduct.
Language that Reveals Possible Scandal
After examining the 10 scandals above as well as Wirecard and other more recent controversies, we identified five textual fingerprints that differ from those of more truthful companies by more than 50%.
Scandal Words and Company Communications
Language Fingerprint | Incidence Relative to the Mean |
Words Indicating Friendship | +56.1% |
Words Indicating Risk | +55.9% |
Impersonal Pronouns | +54.1% |
Words That Indicate Differences | -53.6% |
Words That Negate a Statement | +50.4% |
In addition to text-based analysis, we also conducted one-on-one conversations to better discern between deception and truth and to identify some of the more pan-cultural deceptive behaviors people engage in. Our findings aligned with what previous lie detection researchers had uncovered: that each of the five potential deception indicators that surface in text-based analysis also occur in person-to-person interviews.
So let’s drill a bit deeper into each of them.
1. Words Indicating Friendship
Lie detection researchers have shown that deceivers often employ obfuscation to create confusion. One way they do this is by using words that imply friendship more often than the norm in business communications. Deceptive companies employ such terms 56.1% more than the average, according to our analysis. So if an annual report includes a number of ingratiating terms, it may be evidence of obfuscation and deception.
But a distinction is crucial here: Words that indicate friendship — “friend,” “pal,” “neighbor,” and “gang,” for example — are different from friendly words.
2. Risky Words
Scandal firms favor words that indicate risk at a much higher proportion than the average company. These include such terms as “averse,” “avoid,” “concern,” “difficulty,” “prevent,” “stopped,” and so on. These types of words already tend to raise securities researchers’ hackles, and as we pointed out in the last piece, firms are proactively excising these kinds of “red flag” words from their annual reports.
3. Impersonal Pronouns
“Another,” “everybody,” “someone,” and “whichever” are the sort of impersonal pronouns that dishonest firms employ to a much greater extent — 54.1% more often — than their truthful peers. Why do they prefer to be impersonal in their communications? Researchers theorize that they are trying to create emotional space between themselves and those they wish to mislead.
4. Words That Indicate Difference
Lying is cognitively demanding. One manifestation of this is that during the act of deception, the liar is often unable to make distinctions among competing points of view in their communications and so are less likely to draw comparisons. So the use of words that suggest difference is actually an indication of truthfulness. Constructions that present contrasting viewpoints — “as compared with other years . . .” — are examples of this.
Deceivers also have an agenda: to convince their target to believe their preferred narrative. They are unlikely to draw distinctions between other narratives and will tend to focus on their preferred one.
5. Words That Negate a Statement
Research also indicates that liars often employ more negative terms than truth tellers. This is why we drew the distinction between words indicating friendship and words that are friendly.
But researchers do not always find that the deceivers are more negative than the truthful. Our analysis of dishonest firm communications suggests, however, that they tend to use such words as “not,” “never,” “should not,” “does not,” and “must not” at a 50.4% greater proportion than the average.
Bonus
So what is by far the strongest indicator of deception? The number of swear words in an annual report. Though they are rarities, swear words occur in scandal company annual reports a whopping 277.1% more frequently than the mean.
If you liked this post, don’t forget to subscribe to the Enterprising Investor.
All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.
Image credit: ©Getty Images / Matthias Kulka
Professional Learning for CFA Institute Members
CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.
Great job. thank
Hello Ivan,
Thank you, much appreciated.
With smiles,
Jason
That’s pretty interesting. Thanks for sharing the insights. It looks like you took the “ad” criticism on your last piece seriously. Appreciated!
It would be a lot more interesting if you could give examples of the swear words in the annual reports.
A question has absorbed me after reading this. Will the behavior of writers vary when English is their second language, such as several Asian countries where English is the language of offices but not that of the land. Will they drop different clues for deception detection?
Hello Areeb,
Thanks for taking the time to comment and to share your thoughts. In answer to some of your questions…
Work done on the way deceivers use language has been done in multiple other cultures, and in different languages. Results of this work indicate that liars tend to behave very similarly across the globe. That said, my colleagues at Orbit Financial Technology and I are currently in the midst of verifying some of these assumptions using Mandarin.
As for examples of swear words…to my knowledge, I am only aware of a handful of examples. Because of the rarity of the occurrence of these words, it is hard to prescribe which ones to look for specifically. More important is that swear words are indicative of a kind of attitude on the part of management that is to be avoided.
With smiles,
Jason
Hi Jason,
Thank you for responding. This kind of research surely deserves to be part of the CFA Program curriculum, or say the CFA Program curriculum deserves to be enriched with this type of research. However, I will be sad to see that special competitive edge lost when the knowledge is made available to a larger domain. So this needs to be ongoing as an egg and chicken cycle. Looks like you have a lot of evolutionary work ahead 🙂
All the best,
Areeb
This is spot on and I’d love to see this integrated into the CFA Program as part of the level II curriculum. Learning ratios and other financial statement analysis techniques is important, but so is mitigating substantial losses due to fraud.
Hello Kaon,
That kind of work added to the CFA curriculum would be remarkable and demonstrate a significant advance in the exam authors’ thinking. As disclosed in this article and the first one in this series, 86.5% of information in an annual report is text-based, but the CFA program has (to my knowledge) no techniques for assessing textual information.
With smiles,
Jason
Great work! Congrats on the results! I’d love to apply these concepts here in Brazil.
Hello Phillip,
Feel free to reach out to me via at the website: http://www.deceptionandtruthanalysis.com and we can set up a time to talk about your use cases and needs.
And thank you for your kind words.
With smiles,
Jason
We perceive the world through language, it’s only right that signs of fraud show up in language first.
Thanks, Jason, Insightful as always
What was the rate of false positives? Have you done an analysis of whether this would make money in a portfolio?
Hi David,
The rate of false positives is reported by the size of the word count sample. At 850 words – our recommended minimum sample size – it 11.9%. For word samples of 1,150 words and higher it is 7.1%. As for making money in a portfolio, that story is to be told in next month’s article.
With smiles,
Jason
I know it’s probably not a question for a CFA blog, but do you have any examples of current companies that are starting to fall into these thresholds? If you can’t share that here, are there any places online where you have shared an opinion piece on this?