Fraud and Deception Detection: Text-Based Analysis
Research analysis relies on our trust.
Among the many factors we consider as fundamental investors are assessments of a company’s strategy, products, supply chain, employees, financing, operating environment, competition, management, adaptability, and so on. Investment professionals conduct these assessments to increase our understanding, yes, but also to increase our trust in the data and the people whose activities the data measure. If we cannot trust the data and the people who created it, then we will not invest. In short, we must trust management.
Our fraud and deception detection methods are only okay.
But by what repeatable method can we evaluate the trustworthiness of companies and their people? Usually the answer is some combination of financial statement analysis and “trust your gut.” Here is the problem with that:
1. Time and resource constraints
Companies communicate information through words more than numbers. For example, from 2009 to 2019, the annual reports of the Dow Jones Industrial Average’s component companies tallied just over 31.8 million words and numbers combined, according to AIM Consulting. Numbers only made up 13.5% of the total.
Now, JP Morgan’s 2012 annual report is 237,894 words. Let’s say an average reader can read and comprehend about 125 words per minute. At this rate, it would take a research analyst approximately 31 hours and 43 minutes to thoroughly read the report. The average mutual fund research analyst in the United States makes around $70,000 per year, according to WallStreetMojo. So that one JP Morgan report costs a firm more than $1,100 to assess. If we are already invested in JP Morgan, we’d perform much of this work just to ensure our trust in the company.
Moreover, quantitative data is always publicly released with a significant time lag. Since a company’s performance is usually disclosed quarterly and annually, the average time lag for such data is slightly less than 90 days. And once the data becomes public, whatever advantage it offers is quickly traded away. Most investment research teams lack the resources to assess every company in their universe or portfolio in near real time, or just after a quarterly or annual report is released.
Conclusion: What is that old line? Oh, yeah: Time is money.
2. Trusting our gut does not work.
Despite the pan-cultural fiction to the contrary, research demonstrates we cannot detect deception through body language or gut instinct. In fact, a meta-analysis of our deception-spotting abilities found a global success rate just 4% better than chance. We might believe that as finance pros we are exceptional. We would be wrong.
In 2017, we measured deception detection skills among finance professionals. It was the first time our industry’s lie detection prowess had ever been put to the test. In short: ouch! Our overall success rate is actually worse than that of the general population: We did not score 54%, we earned an even-worse-than-a-coin-toss 49.4%.
But maybe our strengths are in our own sector. Put us in a finance setting, say on an earnings call, and we’ll do much better, right? Nope, not really. In investment settings, we could detect deception just 51.8% of the time.
There is more bad news here (sorry): Finance pros have a strong truth bias. We tend to trust other finance pros way more than we should. Our research found that we only catch a lie in finance 39.4% of the time. So that 51.8% accuracy rate is due to our tendency to believe our fellow finance pros.
One other tidbit: When assessing statements outside of our domain, we have a strong 64.9% deceptiveness bias. Again, this speaks to our industry’s innate sense of exceptionalism. In an earlier study, our researchers found that we believe we are told 2.14 lies per day outside of work settings, and just 1.62 lies per day in work settings. This again speaks to the truth bias within finance.
Finally, we believe we can detect lies within finance at a 68% accuracy rate, not the actual 51.8% measured. Folks, this is the very definition of overconfidence bias and is delusion by another name.
Conclusion: We cannot trust our guts.
3. Auditors’ techniques audit numbers.
But what about auditors? Can they accurately evaluate company truthfulness and save us both time and money? Yes, company reports are audited. But auditors can only conduct their analyses through a micro-sampling of transactions data. Worse still, auditors’ techniques, like ours, are largely focused on that very small 13.5% of information that is captured numerically. That leaves out the 86.5% of text-based content.
Further, because financial statement analysis — our industry’s fraud detection technique — is one step removed from what the auditors see, it is hardly reliable. Indeed, financial statement analyses are just table stakes: Ours probably won’t differ much from those of our competitors. Just looking at the same numbers as everybody else is unlikely to prevent fraud or generate alpha.
And what about private markets? The investment research community has spent an awful lot of time looking for investment opportunities in that space in recent years. But while private market data are sometimes audited, they lack the additional enforcement mechanism of public market participants’ due-diligence and trading activities. These can sometimes signal fraud and deception.
Conclusion: There has to be another tool to help us fight deception.
Scientifically based text analyses to the rescue
Starting with James W. Pennebaker’s pioneering work, researchers have applied natural language processing (NLP) to analyze verbal content and estimate a transcript’s or written document’s credibility. Computers extract language features from the text, such as word frequencies, psycholinguistic details, or negative financial terms, in effect, dusting for language fingerprints. How do these automated techniques perform? Their success rates are between 64% and 80%.
In personal interactions, as we noted, people can detect lies approximately 54% of the time. But their performance worsens when assessing the veracity of text. Research published in 2021 found that people have about a 50% or coin-flip chance to identify deception in text. A computer-based algorithm, however, had a 69% chance.
But surely adding people to the mix improves the accuracy? Not at all. Our overconfidence as investors sabotages our ability to catch deception even in human-machine hybrid models. The same researchers explored how human subjects evaluated computer judgments of deception that they could then overrule or tweak. When humans could overrule, the computer’s accuracy dropped to a mere 51%. When human subjects could tweak the computer judgments in a narrow range around the algorithms’ evaluation, the hybrid success rate fell to 67%.
Computers can give investment pros a huge advantage in evaluating the truthfulness of company communications, but not all deception detection methods are one size fits all.
One computer-driven text-based analysis, published in 2011, had the ability to predict negative stock price performance for companies whose 10-Ks included a higher percentage of negative words. By scanning documents for words and phrases associated with the tone of financial communications, this method searched for elements that may indicate deception, fraud, or poor future financial performance.
Of course, those businesses whose stock prices were hurt by this technique adapted. They removed the offending words from their communications altogether. Some executives even hired speech coaches to avoid ever uttering them. So word-list analyses have lost some of their luster.
Where do we go from here?
It may be tempting to dismiss all text-based analyses. But that would be a mistake. After all, we have not thrown away financial statement analysis, right? No, instead we should seek out and apply the text-based analyses that work. That means methods that are not easily spoofed, that assess how language is used — its structure, for example — not what language is used.
With these issues in mind, we developed Deception And Truth Analysis (D.A.T.A.) with Orbit Financial. Based on a 10-year investigation of those deception technologies that work in and out of sample — hint: not reading body language — D.A.T.A. examines more than 30 language fingerprints in five separate scientifically proven algorithms to determine how these speech elements and language fingerprints interact with one another.
The process is similar to that of a standard stock screener. That screener identifies the performance fingerprints we want and then applies these quantitative fingerprints to screen an entire universe of stocks and produce a list on which we can unleash our financial analysis. D.A.T.A. works in the same way.
A key language fingerprint is the use of articles like a, an, and the, for example. An excess of these is more associated with deceptive than truthful speech. But article frequency is only one component: How the articles are used is what really matters. And since articles are directly connected to nouns, D.A.T.A is hard to outmaneuver. A potential dissembler would have to alter how they communicate, changing how they use their nouns and how often they use them. This is not an easy task and even if successful would only counteract a single D.A.T.A. language fingerprint.
The other key findings from recent D.A.T.A. tests include the following:
- Time and Resource Savings: D.A.T.A. assesses over 70,400 words per second, or the equivalent of a 286-page book. That is a 99.997% time savings over people and a cost savings of more than 90%.
- Deception Accuracy: Each of the five algorithms are measured at deception detection accuracy rates far above what people can achieve in text-based analyses. Moreover, the five-algorithm combination makes D.A.T.A. difficult to work around. We estimate its accuracy exceeds 70%.
- Fraud Prevention: D.A.T.A. could identify the 10 largest corporate scandals of all time — think Satyam, Enron — with an average lead time in excess of six years.
- Outperformance: In one D.A.T.A. test, we measured the deceptiveness of each component of the Dow Jones Industrial Average each year. In the following year, we bought all but the five most deceptive Dow companies. From 2009 through 2019, we repeated the exercise at the start of each year. This strategy results in an average annual excess return of 1.04% despite the sometimes nine-month lag in implementing the strategy.
The writing is on the wall. Text-based analyses that leverages computer technology to detect fraud and deception results in significant savings in both time and resources. Future articles in this series will detail more D.A.T.A. test results and the fundamental analysis wins that this kind of technology makes possible.
If you liked this post, don’t forget to subscribe to the Enterprising Investor.
All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.
Image credit: Getty Images / broadcastertr
Professional Learning for CFA Institute Members
CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.
CFA institute needs to cool it with the thinly veiled advertisements masquerading as articles.
It’s true that some firms are selling NLP tools to analyze shareholder reports. Most of these tools just look for keywords, in spite of the hype and marketing claims of more sophisticated techniques. There are some tools using more sophisticated techniques, but they are small in number and very few people know how to use them properly.
If CFA institute wanted to actually educate members, they might mention that the exact same tools can be used to polish up company publications. Where is the balance in this advertisement come blog post?
Don’t kid yourself into thinking senior management doesn’t hire speech coaches, marketing wizards, advertising executives, PR teams, and yes… consultants that specialize in NLP techniques.
Your firm can spend thousands buying an NLP tool and buying the training or consultants to figure out how to use said tool. Just know that companies are already hiring consultants, in many cases the same consultants, to sanitize and homogenize their publications. It is an arms race, and bayside is playing catch-up, not getting ahead.
Also know that all this polishing and sanitizing and homogenizing tends to make these reports even more vanilla than they are already.
Too bad CFA institute doesn’t tell both sides of the story… it might help if they hired members to write articles, instead of having freelance journalists and (worse) vendors write them. A few days ago, enterprising investor actually had a blog post advocating pay day loans!!!
Hi Robert,
Thanks for your comment. My article above actually discusses the keyword problem that you highlight and links to the original research, as well as the coverage from last fall about how managers are hiring speech coaches to strike offending key words out of their comms.
I was at pains in the article to link to original source materials so that interested readers could explore this topic for themselves. I feel that I would be remiss in raising a key issue that investors face and not also mentioning a solution. Also, because I am one of the co-researchers for the only scientifically-based research on investment professionals’ abilities at deception detection I am uniquely poised to talk about much of this subject matter.
I would be happy to talk with you more about this subject as I can tell it is a passionate one for you.
With smiles,
Jason
Also, since you brought up the pricing on NLP technologies. We have priced D.A.T.A. at a price low enough that a potentially aggrieved party on a dating site can afford a report.
No matter how cheap your product might be, the blog post is still an advertisement.
Recent CFA Enterprising Investor advertisements masquerading as blog posts have included an ad for Turing tech’s ensemble portfolio product, a post supporting pay day loans, and a political diatribe to sell more ESG products. Other than ads, there were was a CNBC style puff piece about warren buffet’s favorite market indicator… written by a freelance journalist with zero financial market experience.
Why should members pay $300 per year member dues to receive thinly veiled advertisements? My spam inbox is filled with ads, and the ads are free.
Hi Robert,
Point taken, and thank you for sharing. That said, if I had bumped into you at a conference and had a discussion with you, I could have counted on you knowing that:
1) That people are poor at detecting deception with just a 54% success rate globally?
2) That our industry has just a 49.4% success rate at discerning truth from deception?
3) That the actual success rate is just 39.4% because we have a very high truth bias?
4) That numbers in annual reports make up just 13.5% of the content?
I thought so!
With smiles,
Jason
As I read the article, I wanted to say what Robert said. Thanks, Robert.
Tough crowd. But I appreciate the article for what it sets out to do – inform about text-based data in financial reporting. Thanks, Jason. Miss all your contributions here. I think your hundreds of EI articles attest to your integrity and helpfulness in communicating about issues that are important to this audience.
Hi Skot,
Nice of you to say, I appreciate it and even better, I am glad that you appreciated the piece.
With smiles,
Jason
Sometimes it may be simpler to look at trends in, for example, reported “goodwill”, or inventories vs sales, and mentally write the “text” yourself comparing THAT with “company communications”.
Hi Kirk,
Yup, these techniques work, and so does D.A.T.A. 🙂 D.A.T.A. is meant as a complement to the work you do, not a substitute. Also, because computers are used, your entire portfolio may be scanned in seconds and for something likely not being looked at right now and with higher accuracy.
With smiles,
Jason
As a student of various predictive analytics techniques, I found the article quite enlightening. These capabilities are future differentiators and whether you perceive this writing to be promotional or not, as investment professionals we should get as much exposure to them as we can.
Hi James,
Thank you for your comment. I think you are right. I have had many conversations over recent years with investment managers who are starting to open themselves up to the idea that they need to begin taking advantage of computing in their investment process. In my experience, the primary hangup has been that they see it as anti-human, rather than pro-human. That is, it shakes ’em up at an existential level. A lack of performance, and pressure on fees seem immutable laws at this juncture and putting them to consider new things.
With smiles,
Jason
If any of the respected names in our profession (AQR, Rayliant Global, Research Affiliates, GMO, etc.) write about their tools and methods, statistical facts, probabilities of outcomes, reasons for good or bad performance, we all know they are in business and charge for their services.
We need to consider all research that appears to be based quality work. If it doesn’t interest us, we can drop it and get on with other activities. If it does interest us, we can investigate and verify to our satisfaction.