I have spent the summer learning more about the new AI tools that became available this year and actually experimenting with them on various pet projects that have been lying around my desk for a while (ranging from Fitbit and Duolingo to real estate listings and EDGAR filings). The idea was to get a good feel for the limits of AI they stand today. I talked to colleagues in tech, finance, and law to understand how they use AI in their workflows. In between, I read some research papers to get a more scientific perspective on the matter, too.
With new AI tools cropping up constantly, it is hard to keep track. Essentially, all the big players refreshed their models this summer. Yet, there were no truly groundbreaking improvements. Nevertheless, AI is making its way into more and more workflows. Just a couple of days ago Koyfin launched a cool feature — AI Transcript Summaries. Pretty much everyone I have talked to has already incorporated AI in their daily life. As you would expect, work-related use is most common in the tech sector, but guys in finance and law are also experimenting with it.
The reason I am writing this post is twofold. First, to establish some basic ground truth, which will help you tune your AI-bullshit-meter when listening to conference calls and reading annual reports, most of which mention AI in some shape or form. Second, to share how I use AI in my research process.
Expectations vs Reality
Determinism
It may seem trivial, but I find this the most vital concept to keep in mind when working with AI. The key difference between the software tools we have used in the past and the AI tools that have spread like wildfire recently is determinism — a topic I find most AI users struggling with. On the surface, everyone knows that AI can generate a different response to the same question. Yet, it is extremely common to talk to people who are disappointed when AI is not giving them the right answer or a consistent answer, or it even lies or “hallucinates,” to use the lingo. It is surprisingly common for people to be caught off guard by a fact that is supposed to be common knowledge.
This is a feature, not a bug. It is essential to the richness of what AI can offer - what makes it so human-like. This means that it requires a different approach, too. Luckily, this approach is quite intuitive for us thanks to our experience communicating with people, who tend to be quite non-deterministic, too. I will come back to the key principles for working with AI in a moment.
It is such a common pitfall that the scientists at OpenAI had to publish a paper titled Why Language Models Hallucinate on September 4. In the paper they highlight that hallucinations are fundamentally inevitable for current large language models. Even with perfect data and training, mathematical constraints make it impossible to fully eliminate them.
Why do you think hallucinations occur? It is for the very same reason that people so often lie and fail - because they (people and AI) are incentivized to be certain and sound certain even when there is nothing to be certain about. AI is the quintessential consultant — always ready to authoritatively weigh in on any complex problem without a trace of uncertainty.
Models are rewarded during training and evaluation for guessing, not for admitting uncertainty. The dominant scoring systems penalize both wrong answers and “I don’t know” responses equally, so models prefer to make plausible guesses rather than abstain.
The OpenAI scientists suggest changing evaluation benchmarks so models are rewarded for admitting uncertainty instead of guessing under uncertainty, which, by the way, cannot eliminate hallucinations entirely given the underlying math. This modest proposal sounds awfully familiar. I have read this same advice in dozens of books about investing. Legends like Warren Buffett, Howard Marks, and Seth Klarman stress the importance of recognizing and admitting uncertainty. You all know for a fact that successful investors avoid overconfidence and acknowledge what they don’t know. Except, only a tiny minority actualy practice this. Somehow, the social incentives push in the opposite direction. People crave certainty. They are willing to pay huge sums of money to be coddled in a sense of certainty, which means control — even if it is fake. Because control is vital to happiness. As Dan Gilbert says in Stumbling on Happiness about the illusion of control:
”In fact, the one group of people who seem generally immune to this illusion are the clinically depressed, who tend to estimate accurately the degree to which they can control events in most situations.”
That’s what will happen. We will get clinically depressed AI if we try to make it less confident. We will also lose the user base, because, despite what they will tell you, no one wants to pay good money to talk to a factually accurate but extremely uncertain and depressive AI that sounds like Marvin from The Hitchhiker’s Guide.
Key Takeaways from the Literature
I have read a number of research papers addressing the burning question — is AI as good as they say? Not surprisingly, and not very helpfully, all the research I found comes from the tech sector. But that’s what it is. Other sectors are being cautious and moving slower.
Here are the key takeaways from the papers that I have read (and those I just summarized with AI):
Most developers now use AI tools, but trust in AI-generated code is rapidly declining, with 45% frustrated by solutions that are “almost right, but not quite”.
While AI boosts productivity for new code by 30-40%, much of this code needs significant rework, resulting in net gains closer to 15-20%.
The longer AI assistants debug code, the less effective they become, with sharp drops in performance after multiple attempts (a phenomenon called “debugging decay”).
AI is most effective when enabling self-service analytics and automating well-defined, repetitive tasks, rather than trying to replace human expertise.
These insights suggest a measured optimism for AI. Real gains exist, but hype-driven expectations should be tempered with practical understanding of both its strengths and pitfalls. This is something that I can believe and something that I sense (but haven’t measured) based on my experience with AI.
I find the MIT Media Lab report, stating that “95% of AI implementations failed”, hard to believe. Failure was defined narrowly as “no direct, measurable P&L impact within six months.” I can see how this statement can be true while completely missing improvements to individual productivity or operational efficiency that don’t show up directly in the P&L, and definitely not within 6 months. Developers may have low emotional intelligence, but they are not retarded. Why would they take on more work for the same pay, even if AI is allegedly doing this work? I also suspect these initial projects were low value experiments and PoCs that had zero chance of moving the P&L needle by design. Finally, throwing a poorly formulated, data-starved, and possibly underfunded project to AI and hoping for the best sort of invites this kind of result, don’t you think? I have seen this happen over and over again with machine learning projects, which was yesterday’s (about a decade ago in calendar years) panacea, for those who missed it. Companies that never invested a cent in structuring, cleaning, and enhancing their data were all of a sudden sorely disappointed by the shortcomings of ML models, reporting similar failure rates. You can imagine my surprise at their results this time around.
Principles to Maximize the Return on Your AI Efforts
Now, back to the promised key principles to maximize clarity, relevance, and effectiveness of your AI efforts. I have no doubt that everyone and their cousin is shelling the AI data centers with prompts on any imaginable topic. Equity analysts, being the cutting-edge, ahead-of-the-curve bunch that they are, now have a tool to delegate the laborious digging to. Of course, the problem is the age-old one of standing on tiptoes at a concert. It is great, and it works — until everyone around you starts doing it.
The other problem, also quite common if you have tried using AI for investment research, goes like this. You ask ChatGPT to analyze a stock, it spits out what looks like a comprehensive report, and then you realize half the numbers are wrong and the other half are from 2019. Welcome to the club. AI research tools have some pretty consistent blind spots that will trip you up if you are not careful.
Here are a few key principles that will help you get the best out of your favorite AI.
As Snoop said
Be Specific
Clearly state what you want the AI to do, avoiding ambiguity or open-ended requests. The more specific your instructions, the better the response. Keep in mind that without explicit and specific instructions AI will act like an overeager analyst who jumps straight into execution without thinking twice whether they understand the task and how to approach it.
Provide Plenty of Context
Supply any relevant background or situational details — such as desired format, audience, or use case — to help the AI tailor its answer. Just as you would give a much better answer when you have a solid grasp on the context, so will your AI assistant.
Aim for Conciseness
Be direct and avoid unnecessary or verbose instructions. Try to balance being detailed and being brief for best results. This one is a little at odds with my last point, but bear with me. You have to provide all the relevant context and nothing extra. Don’t dilute your message.
Provide Clear Constraints
Ask the AI to act from a particular perspective (e.g., “Act as if you’re an equity analyst”), specify preferred format (e.g., list, table, summary) and sources. Keep in mind that AI research tools only work with what they have been trained on and what is freely available online. All those expensive Bloomberg terminals, S&P Capital IQ datasets, and professional research reports? Not accessible.
Tell the AI not to make the answer up if it doesn’t know. I have already talked too much about this, but it is vital to get factual answers, and I assume this is what you are after as an investor.
Iteration and Clarification
Related to the above, make the AI ask clarifying questions when it is in doubt or finds the instructions unclear. Iterate. If the initial output is not what you wanted, refine your prompt or ask targeted follow-up questions. Learning through iterations often produces stronger answers.
Reboot
It is not uncommon to get stuck in a loop or the answer quality to decrease as you iterate. At this point, you will get better results if you start over in a new session with the summary and lessons learned, from the previous one.
Which AI?
At this point, you are probably asking yourself why you read all this when there is no mention of the AI tools. So I feel compelled to share my thoughts on the topic, although, to be completely honest, I don’t think the the tools make the man (investment analyst in this case). You can get quite similar results from any of the major models. Certainly, there are flavors, personalities, vibes, and things that some do better than others. Here is my take.
I prefer Perplexity for deep research. It excels at giving you clean, readable reports with proper citations. Checking the citations is the best way known to man of making sure your AI isn’t hallucinating. This leaves you with the problem that the original author might have hallucinated.
Sticky problem these hallucinations. If you really want to stay grounded and work off of first principles and primary sources, your have to conscientiously and singlehandedly select credible sources and plug them into Notebook LM, which is constrained to the sources you give it.
If you need to analyze earnings reports, 10-Ks, or other lengthy documents, Claude is also a great choice. It can handle much longer texts while maintaining consistency throughout the analysis.
Here is something you didn’t expect. BlackRock found that newer versions of GPT actually performed worse on financial prediction tasks than earlier versions. The models are being optimized for general conversation, not financial accuracy. This is why purpose-built tools often outperform general AI for specific financial tasks.
BlackRock uses an AI-powered tool that combines LLMs and big data to quickly build equity baskets around investment themes (such as weight-loss pharmaceuticals). Human analyst expertise is involved throughout, ensuring quality and correcting any AI errors.
AI tells me that there are purpose-built AI tools for financial analysis and stock prediction accessible to the general public. I haven’t tested any, but it could be a fun exercise.
Conclusion
Here’s the bottom line. AI can genuinely improve your investment research process, but only if you use it thoughtfully. It is not deterministic, but neither is the world of investing. If you know how to work with probabilistic outputs, you should be quite comfortable with AI.
The key insight is treating AI like a very capable but sometimes unreliable research assistant. It can process information faster than you ever could, identify patterns you might miss, and help you formulate better questions. But it can’t replace judgment, and it definitely can’t replace verification.
AI is here to make you a better investor, not to make investment decisions for you.