Fraud and Deception Detection: Five Language Fingerprints

By Jason Voss, CFA

Posted In: Economics, Financial Statement Analysis, Leadership, Management & Communication Skills, Performance Measurement & Evaluation, Risk Management, Standards, Ethics & Regulations (SER)

Last month, I described how computer-aided text-based analysis can help uncover fraud and deception in company communications. But what other insights can we glean from this research into scandal companies?

We used Deception And Truth Analysis (D.A.T.A.) to examine 10 of the largest corporate scandals in recent history and found that the average lead time between our textual identification of deception and the public recognition of possible scandal was more than six years.

Corporate Scandals: Time between Textual Evidence and Public Recognition

Ticker	Company	Size, in US Millions	Scandal Year	Average Alert Score in Lead-Up	Average Alert Score Pre-Scandal	Years Warning
ACC	Adelphia	$2,300	2002	-46%	-44.8%	2
AIG	AIG	$3,900	2005	-30.6%	-52.4%	12
CUC	Cendant	$640	1998	-37.9%	-48.8%	3
ENRN	Enron	$74,000	2001	-87.4%	-76.3%	8
HLS	HealthSouth	$1,400	2003	-42.2	-27.1%	9
LEH	Lehman Bros.	$50,000	2008	-37.2%	-3.8%	13
SAY	Satyam	$1,400	2009	-28.9%	-38.4%	6
TYC	Tyco International	$600	2002	-77.1%	-81.7%	7
WCOM	WorldCom	$3,800	2001	-33.9%	-47.9%	4
WM	Waste Management	$6,000	1997	-39.4%	-41.1%	2
	Total	$144,290		Average	-40.3%	6.6

The obvious question is why. Why does it take regulators and markets so long to recognize these scandals? And a follow-up question: What insights from text-based analysis can we use to better identify these scandals earlier? Let’s take these in turn.

Theory: It’s the Behavior

Why does D.A.T.A. detect deception faster than acutely interested investors and regulators? After thinking about this for a while, we developed a theory, and it boils down to 86.5%. That is the percentage of financial information that is expressed in text, not in numbers, in annual reports. Text communications reveal the behavior of corporate management teams, and that behavior leads to the outcome that is expressed in numerical performance.

So that 6.6 years between the initial indication of deception and when the scandal breaks is the average length of time that a poorly behaving firm can fake it, until they just can’t massage the numbers any longer.

What is interesting is that the two scandals that took over a decade to recognize both involved financial companies: AIG and Lehman Brothers. Their annual reports ran in the hundreds of pages, and the velocity of money cycling through their balance sheets and income and cash flow statements was very, very high. Thus, it took considerable time for their poor behaviors and choices — the inputs — to eventually show up in the numbers, or the outputs.

If this theory is a valid explanation for that lead time, then scandal ought to have language fingerprints that investors can dust for as either an early warning system or as a second opinion on the normal fundamental work that investment research teams conduct.

Financial Analysts Journal Current Issue Tile

Language that Reveals Possible Scandal

After examining the 10 scandals above as well as Wirecard and other more recent controversies, we identified five textual fingerprints that differ from those of more truthful companies by more than 50%.

Scandal Words and Company Communications

Language Fingerprint	Incidence Relative to the Mean
Words Indicating Friendship	+56.1%
Words Indicating Risk	+55.9%
Impersonal Pronouns	+54.1%
Words That Indicate Differences	-53.6%
Words That Negate a Statement	+50.4%

In addition to text-based analysis, we also conducted one-on-one conversations to better discern between deception and truth and to identify some of the more pan-cultural deceptive behaviors people engage in. Our findings aligned with what previous lie detection researchers had uncovered: that each of the five potential deception indicators that surface in text-based analysis also occur in person-to-person interviews.

So let’s drill a bit deeper into each of them.

1. Words Indicating Friendship

Lie detection researchers have shown that deceivers often employ obfuscation to create confusion. One way they do this is by using words that imply friendship more often than the norm in business communications. Deceptive companies employ such terms 56.1% more than the average, according to our analysis. So if an annual report includes a number of ingratiating terms, it may be evidence of obfuscation and deception.

But a distinction is crucial here: Words that indicate friendship — “friend,” “pal,” “neighbor,” and “gang,” for example — are different from friendly words.

2. Risky Words

Scandal firms favor words that indicate risk at a much higher proportion than the average company. These include such terms as “averse,” “avoid,” “concern,” “difficulty,” “prevent,” “stopped,” and so on. These types of words already tend to raise securities researchers’ hackles, and as we pointed out in the last piece, firms are proactively excising these kinds of “red flag” words from their annual reports.

3. Impersonal Pronouns

“Another,” “everybody,” “someone,” and “whichever” are the sort of impersonal pronouns that dishonest firms employ to a much greater extent — 54.1% more often — than their truthful peers. Why do they prefer to be impersonal in their communications? Researchers theorize that they are trying to create emotional space between themselves and those they wish to mislead.

4. Words That Indicate Difference

Lying is cognitively demanding. One manifestation of this is that during the act of deception, the liar is often unable to make distinctions among competing points of view in their communications and so are less likely to draw comparisons. So the use of words that suggest difference is actually an indication of truthfulness. Constructions that present contrasting viewpoints — “as compared with other years . . .” — are examples of this.

Deceivers also have an agenda: to convince their target to believe their preferred narrative. They are unlikely to draw distinctions between other narratives and will tend to focus on their preferred one.

5. Words That Negate a Statement

Research also indicates that liars often employ more negative terms than truth tellers. This is why we drew the distinction between words indicating friendship and words that are friendly.

But researchers do not always find that the deceivers are more negative than the truthful. Our analysis of dishonest firm communications suggests, however, that they tend to use such words as “not,” “never,” “should not,” “does not,” and “must not” at a 50.4% greater proportion than the average.

Bonus

So what is by far the strongest indicator of deception? The number of swear words in an annual report. Though they are rarities, swear words occur in scandal company annual reports a whopping 277.1% more frequently than the mean.

If you liked this post, don’t forget to subscribe to the Enterprising Investor.

All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.

Professional Learning for CFA Institute Members

CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.

Tags: communication skills, fintech, Investment Management Strategies, Lie Detection, natural language processing, Regulations Standards and Ethics

Share On

About the Author(s)

Jason Voss, CFA

Jason Voss, CFA, tirelessly focuses on improving the ability of investors to better serve end clients. He is the author of the Foreword Reviews Business Book of the Year Finalist, The Intuitive Investor and the CEO of Active Investment Management (AIM) Consulting. Voss also sub-contracts for the well known firm, Focus Consulting Group. Previously, he was a portfolio manager at Davis Selected Advisers, L.P., where he co-managed the Davis Appreciation and Income Fund to noteworthy returns. Voss holds a BA in economics and an MBA in finance and accounting from the University of Colorado.

Ethics Statement

My statement of ethics is very simple, really: I treat others as I would like to be treated. In my opinion, all systems of ethics distill to this simple statement. If you believe I have deviated from this standard, I would love to hear from you: [email protected]

13 thoughts on “Fraud and Deception Detection: Five Language Fingerprints”

Ivan says:

12 March 2021 at 05:19

Great job. thank

Reply
1. Jason A. VOSS says:
  
  13 March 2021 at 20:30
  
  Hello Ivan,
  
  Thank you, much appreciated.
  
  With smiles,
  
  Jason
  
  Reply
Areeb Shujaat says:

14 March 2021 at 00:03

That’s pretty interesting. Thanks for sharing the insights. It looks like you took the “ad” criticism on your last piece seriously. Appreciated!

It would be a lot more interesting if you could give examples of the swear words in the annual reports.

A question has absorbed me after reading this. Will the behavior of writers vary when English is their second language, such as several Asian countries where English is the language of offices but not that of the land. Will they drop different clues for deception detection?

Reply
1. Jason Voss, CFA says:
  
  15 March 2021 at 12:38
  
  Hello Areeb,
  
  Thanks for taking the time to comment and to share your thoughts. In answer to some of your questions…
  
  Work done on the way deceivers use language has been done in multiple other cultures, and in different languages. Results of this work indicate that liars tend to behave very similarly across the globe. That said, my colleagues at Orbit Financial Technology and I are currently in the midst of verifying some of these assumptions using Mandarin.
  
  As for examples of swear words…to my knowledge, I am only aware of a handful of examples. Because of the rarity of the occurrence of these words, it is hard to prescribe which ones to look for specifically. More important is that swear words are indicative of a kind of attitude on the part of management that is to be avoided.
  
  With smiles,
  
  Jason
  
  Reply
  1. Areeb says:
    
    15 March 2021 at 21:36
    
    Hi Jason,
    
    Thank you for responding. This kind of research surely deserves to be part of the CFA Program curriculum, or say the CFA Program curriculum deserves to be enriched with this type of research. However, I will be sad to see that special competitive edge lost when the knowledge is made available to a larger domain. So this needs to be ongoing as an egg and chicken cycle. Looks like you have a lot of evolutionary work ahead 🙂
    
    All the best,
    Areeb
    
    Reply
Kaon says:

14 March 2021 at 11:00

This is spot on and I’d love to see this integrated into the CFA Program as part of the level II curriculum. Learning ratios and other financial statement analysis techniques is important, but so is mitigating substantial losses due to fraud.

Reply
1. Jason Voss, CFA says:
  
  15 March 2021 at 12:40
  
  Hello Kaon,
  
  That kind of work added to the CFA curriculum would be remarkable and demonstrate a significant advance in the exam authors’ thinking. As disclosed in this article and the first one in this series, 86.5% of information in an annual report is text-based, but the CFA program has (to my knowledge) no techniques for assessing textual information.
  
  With smiles,
  
  Jason
  
  Reply
Phillip Soares says:

15 March 2021 at 09:55

Great work! Congrats on the results! I’d love to apply these concepts here in Brazil.

Reply
1. Jason Voss, CFA says:
  
  15 March 2021 at 12:41
  
  Hello Phillip,
  
  Feel free to reach out to me via at the website: http://www.deceptionandtruthanalysis.com and we can set up a time to talk about your use cases and needs.
  
  And thank you for your kind words.
  
  With smiles,
  
  Jason
  
  Reply
Dinesh da Costa says:

16 March 2021 at 06:08

We perceive the world through language, it’s only right that signs of fraud show up in language first.

Thanks, Jason, Insightful as always

Reply
David Merkel says:

16 March 2021 at 14:41

What was the rate of false positives? Have you done an analysis of whether this would make money in a portfolio?

Reply
1. Jason A. Voss, CFA says:
  
  19 March 2021 at 13:22
  
  Hi David,
  
  The rate of false positives is reported by the size of the word count sample. At 850 words – our recommended minimum sample size – it 11.9%. For word samples of 1,150 words and higher it is 7.1%. As for making money in a portfolio, that story is to be told in next month’s article.
  
  With smiles,
  
  Jason
  
  Reply
Logan says:

29 March 2021 at 15:01

I know it’s probably not a question for a CFA blog, but do you have any examples of current companies that are starting to fall into these thresholds? If you can’t share that here, are there any places online where you have shared an opinion piece on this?

Reply