ChatGPT: Copilot Today, Autopilot Tomorrow?
For more on artificial intelligence (AI) applications in investment management, read The Handbook of Artificial Intelligence and Big Data Applications in Investments, by Larry Cao, CFA, from CFA Institute Research Foundation.
ChatGPT and other large language models (LLMs) may someday automate many investment management and finance industry tasks. While that day is not here yet, LLMs are still useful additions to the analyst’s toolkit.
So, based on what we have learned about the new, dark art of prompt engineering, how can quant and fundamental analysts apply LLMs like ChatGPT? How effective a copilot can these technologies be?
Fundamental Analyst Copilot
Stock analysts generally know their companies from top to bottom, so ChatGPT may not reveal anything altogether new about their primary names. But LLMs can generate overviews of less well-known firms quickly and at scale.
Here are the ChatGPT prompts we’d deploy to analyze a hypothetical CompanyX.
- “explain the business model of CompanyX”
- “conduct SWOT analysis of CompanyX” (strengths, weaknesses, opportunities, threats)
- “list 10 competitors of CompanyX”
- “list the 10 main risks to an investment in CompanyX”
Environmental, Social, and Governance (ESG) Overview
- “list and describe 10 key Environmental scandals of CompanyX”
- “list and describe 10 key Governance scandals of CompanyX”
- “list and describe 10 key Social scandals of CompanyX”
- Drill down as appropriate
We’d also add a standard ending to each prompt to increase the chances of an accurate response: “list your sources; if you do not know an answer, write ‘Do not know.’”
Now we can test some of these prompts in two simple case studies:
- “summarize: [web address of text document, or paste in the text]”
- “list 10 key negatives” (risky unless we provide source text)
- Drill down as appropriate
We ran the above ChatGPT analysis on two real-life companies — Mphasis, a lightly covered Indian mid-cap, and Vale, a very well-covered Brazilian mining company — and scored the results of each task on a one-to-five scale, with five being the highest. The answers were generated simply by prompting ChatGPT-4, but in actual practice, the highest-tech managers would automate much of this process. We would use multiple LLMs, which give us more control over the responses, greater validation and cross-checking, and much greater scale. Of course, like all ChatGPT-produced results, those below need to be treated with care and not taken at face value, especially if we are relying on the model’s training data alone.
1. Mphasis Company Overview
While the results are hardly revelatory, ChatGPT does provide an informative, high-level summary of Mphasis. We also prompt it for sources and explicitly instruct it not to make things up. Such measures improve accuracy but are not foolproof.
As we proceed, the LLM offers up more interesting insights.
We can now drill down with a little SWOT analysis.
Our SWOT analysis identifies “Dependencies on Certain Industries” as a potential weakness for the company. So, we pose additional questions to help understand the underlying context.
Mphasis Company Overview Score: 4
2. Vale ESG Overview
Vale’s record on ESG issues has generated headlines, and ChatGPT picks up on the major themes. A simple prompt for a specific aspect — “Social” — yields accurate results, even though the system cautions that it cannot attribute sources and recommends we cross-reference the response. To get into more detail, we need to delve deeper than ChatGPT allows.
Vale ESG Overview Score: 3
Ground Truthing: ChatGPT Interrogates and Summarizes
Latest Mphasis Data Summary
ChatGPT can summarize and interrogate a company’s latest earnings call, news flow, third-party analysis, or whatever data we provide — this information is called the “ground truth,” which is a different use of the expression than in supervised machine learning. But if we don’t specify and deliver the text for ChatGPT to analyze, as we saw above, it will rely only on its training data, which increases the risk of misleading “hallucinations.” Moreover, the end-date of the LLM’s training data will limit the possible insights.
Another point to keep in mind: Official company communications tend to be upbeat and positive. So rather than ask ChatGPT to “summarize” an earnings call, we might request that it “list 10 negatives,” which should yield more revealing answers. ChatGPT delivers fast and effective results. Though they are often obvious, they may reveal important weaknesses that we can probe further.
Latest Mphasis Data Summary Score: 5
Quant Analyst Copilot
ChatGPT can write simple functions and describe how to produce particular types of code. In fact, “GPT codex,” a GPT-3 component trained on computer programming code, is already a helpful auto-complete coding tool in GitHub Copilot, and GPT-4 will be the basis of the forthcoming and more comprehensive GitHub Copilot X. Nevertheless, unless the function is fairly standard, ChatGPT-generated code nearly always requires tweaks and changes for correct and optimized results and thus serves best as a template. So at the moment, LLM autopilots appear unlikely to replace quant coders anytime soon.
A quant might use ChatGPT for the three tasks described below. Here we are simply prompting ChatGPT. In practice, we would access specific codex LLMs and integrate other tools to create far more reliable code automatically.
1. Develop an Entire Investment Pipeline
ChatGPT can partly execute complex instructions, such as “write python functions to drive quant equity investment strategy.” But again, the resulting code may need considerable editing and finessing. The challenge is getting ChatGPT to deliver code that is as close as possible to the finished article. To do that, it helps to deploy a numbered list of instructions with each list item containing important details.
In the example below, we prompt ChatGPT to create five functions as part of a factor-based equities investment strategy and score each function on our five-point scale. For slightly higher accuracy, we would also construct a prompt for the system to “ensure packages exist, ensure all code parses.”
1. Download Factor Time-Series Data
ChatGPT generates a decent function that downloads a zip file of factor data from the Kenneth R. French Data Library and extracts a CSV file. But we had to add nuanced instructions — “download zip file, unzip, read csv into Pandas DataFrame” — for ChatGPT to perform well.
2. Download Equity Returns Data
Again, the function ChatGPT writes does work. But again, we had to add more details, such as “using get_data_yahoo, read csv into Pandas DataFrame,” to make the function work properly.
3. Align the Dates in Our Downloaded Data
The data we downloaded, from the Kenneth R. French Data Library and Yahoo, have different date formats and frequencies. ChatGPT did not sort this issue for us, so we had to reformat dates and then write the code to align the two sets of data. This data wrangling is the most time-consuming and risky aspect of most data processes, and ChatGPT was of little help.
4. Use a Simple Factor Model to Forecast Returns
With ChatGPT, we can calculate stock-level factor loadings, but the expected returns are based on the factor returns we used to fit the model. This is not helpful. So, we have to investigate and understand where ChatGPT went awry and manually fix it.
5. Construct Portfolios and Run Simulations
The final simulation function misfires. It fails to generate expected returns for all of our stocks over all time periods in our data and isn’t an effective guide for portfolio construction decisions. It just calculates one expected return value for each stock.
We must intervene to loop through each time period and engineer the function to do what we want it to. A better prompt makes for better results.
Develop an Entire Investment Pipeline Score: 1
2. Create a Machine-Learning, Alpha-Forecasting Function
Follow-up requests give us a simple machine-learning function, or template, to forecast stock returns. ChatGPT does a reasonable job here. It provides a function that we can then adjust and offers advice on how to apply it, recommending cross-validation for a random forest.
Create a Machine-Learning, Alpha-Forecasting Function Score: 4
3. Create a Useful Function: Target Shuffling
We next ask ChatGPT to write a helpful and moderately complex function to conduct target shuffling. Target shuffling is a method to help verify an investment model’s outcomes. A simple request to “write Python code for a target shuffling function” does not give us much. Again, we had to input a detailed list outlining what we want for ChatGPT to produce a reasonable template.
Create a Useful Function: Target Shuffling Score: 5
As an adjunct to a fundamental analyst, ChatGPT functions reasonably well. Though detail is sometimes lacking on less-well-covered companies, the stock summaries demonstrate ChatGPT’s speed and precision as an aggregator — when queries require no reasoning, subjectivity, or calculation. For ESG applications, ChatGPT has great potential, but once we identified a controversy, we could only drill down so far as the system only had so much data.
ChatGPT excels at quickly and precisely summarizing earnings transcripts and other long-form text about companies, sectors, and products, which should free up time for human analysts to dedicate to other tasks.
While ChatGPT seems to disappoint as a quant copilot, it does add some value. To produce complex pipelines, ChatGPT needs precise prompts that require considerable time and intervention to construct. But with more specific functions, ChatGPT is more reliable and can save time. So overall, ChatGPT’s effectiveness as a copilot is largely a function of how well we engineer the prompts.
However, if we step things up and build an application on top of GPT-4, with refined prompts, cross-validated results, and structured outputs, we could significantly improve our results across the board.
Professional Standards, Regulation, and LLMs
What sort of implications do LLMs have for professional standards and regulation? In “Artificial Intelligence and Its Potential Impact on the CFA Institute Code of Ethics and Standards of Professional Conduct,” CFA Institute raised important questions about LLMs’ investment management applications, and there are obvious concerns about appropriate risk management, interpretability, auditability, and accountability around LLMs.
This is why the direct and uncontrolled application of ChatGPT responses to investment decision making is currently a nonstarter. But the technology is moving fast. Alphabet, for example, is working to provide sources for LLM responses. Further developments in so-called machine reasoning and causal machine learning may widen LLMs’ applications still further. Nevertheless, current, raw LLM technology cannot satisfy the duty of care obligations intrinsic to investment management. Which is why — absent access to the most sophisticated resources that can implement cross-validated and checked LLM responses — we advise against anything but the most peripheral use of LLMs.
LLMs: Future Applications in Investment Management
If analysis and investment indeed compose a mosaic, LLMs provide managers who understand the technology with a powerful tile. The examples above are simply ChatGPT prompts, but developers and managers with class-leading technology are already working to apply LLMs to investment management workflows.
In investment management, LLMs may already be at work on the following tasks:
Portfolio managers could sense check investments with LLMs at a portfolio or even asset allocation level based on such criteria as ESG scandals or investment risks. This could ultimately be extended to institutional investing and robo-advisers.
LLMs can help fundamental analysts quickly acquire basic knowledge about many companies at once. And quant analysts can use them to develop and debug code. Of course, there are risks and drawbacks that need to be carefully managed. The ChatGPT prompts we use above show one way to do this manually, but apps that write prompts automatically are likely to be available soon and should help achieve more detailed and specific objectives. Indeed, we expect a new tech arms race to develop.
Ultimately higher-tech systematic managers will harness LLMs to automate the research that fundamental analysts would otherwise conduct. But they will use this output as another input to their stock selection and investment models. For this to work, LLMs’ flaws, particularly those related to timeliness and logical or causal reasoning, will have to be addressed.
But even in their current form, well-integrated LLMs can create significant efficiencies if applied in the right way. And they hint at the technology’s vast potential.
In its next generation, LLM technology will become an indispensable investment management tool. By automating information gathering and other tasks, human analysts will have more time and bandwidth to focus on the reasoning and judgment side of the investment process. This is only the beginning.
For further reading on this topic, check out The Handbook of Artificial Intelligence and Big Data Applications in Investments, by Larry Cao, CFA, from CFA Institute Research Foundation.
If you liked this post, don’t forget to subscribe to the Enterprising Investor.
All posts are the opinion of the author(s). As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.
Image credit: ©Getty Images / Olivier Le Moal
Professional Learning for CFA Institute Members
CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.