Using Generative AI to augment or replace manual workflows has become commonplace for knowledge workers within one and half years from the release of ChatGPT.
Everyone has tried ChatGPT for something, developers and marketers especially use more specific tools for various use cases, and RAG (Retrieval Augmented Generation) is now a standard for augmenting generative workflows with enterprise data. In parallel, the capabilities of LLMs (Large Language Models) are constantly increasing, with two-level shifts since the launch of GPT-3.5, GPT-4o setting the current standard, and the next OpenAI frontier model now in training.
Comparing the Generative AI timeline with some of the previous transformative technologies, especially the Internet and Mobile, we can still regard the Generative AI transformation as being in its infancy. Being aware of this, it makes sense to look at what is brewing next for Generative AI, and where we should expect to see the most advancement in the near term.
Small Language Models
With all eyes currently on the largest and highest performing LLMs, and OpenAI, Google, Cohere and Meta in a race for the top spot, it might be easy to overlook advance on the other end of the model spectrum. For some tasks, the highest possible model quality is the driving factor, but for others, it often makes sense to compromise some quality for gains in model throughput and price. Especially when embedding LLM-driven functionalities into software, the tasks tend to be very high in quantity and highly repetitive.
This revelation marks the rise of SLMs (Small Language Models), which are far less compute-intensive than LLMs, both in training and inference (use of the trained model). Compared to 100s of billions of parameters in the largest models, the small models typically come in parameter counts under 10 billion. They are also trained with a far smaller dataset. The model performance is impacted far less than the parameter count would suggest due to advances in model design and increased quality of training data. Consuming less compute and energy than LLMs, SLMs can also guide Generative AI toward a more sustainable future.
Many SLMs can be used over an API as a drop-in replacement for an LLM for a much-reduced cost. More interestingly, many of the SLMs are open source or otherwise allow capabilities for technically simple and cheap fine-tuning. Especially with a technique called PEFT (Parameter Efficient Fine Tuning), a small model can be customized to a specific use case with only a small set of data, often leading to even higher performance for that specific problem set than a general-purpose LLM would deliver. As a neat trick, the training data can often be created with an LLM, preferably with human oversight.
The small size of SLMs allows running the models either on-premise or even on the edge, which can be a major enabler of new use cases for language models. However, we need to stay aware of drawbacks of the fine-tuned SLM approach. Much of the awe, as well as the value proposition of LLMs is related to them being context-independent, requiring no training data or data scientist. LLMs also enabled true zero-shot training, where the model can perform in scenarios not seen in training data. With SLMs, implementation can be more like the traditional data science process. Therefore, we expect both SLMs and LLMs to be here to stay.
Open Models & Open Ecosystem
Combined with the rise of SLMs, we’re seeing a proliferation of open-source language models. In the context of openness, it’s important to remember the “Open” in OpenAI is only driven by legacy, and instead it is one of the most closed providers of LLMs. Also, not all open source is equal: to fully replicate a model, training data, weights and training code need to be available, but in practice, many are only partially open. Keeping this in mind, the open-source model phenomena is still more related to SLMs which very often are open. Headed by Meta, Mistral and XAI, they’re also catching up with best-in-class closed-source LLM performance.
Considering closed-source LLMs are powerful and for most use cases fairly cheap to use, the why question for open-source also needs to be answered. On a high level, it boils down to four reasons that partially overlap with reasoning for SLMs: confidentiality, customizability, cost and control. Especially when building software that embeds language models, these can be non-negotiable, and the use of closed-source licensed models a non-starter.
In addition to model development becoming more transparent, it’s also worth noting that the full ecosystem is becoming more generic. Enabled by tools and frameworks like LangChain, LlamaIndex, Haystack and Griptape, models are becoming more interchangeable, with replacement of a model with another being a few lines of code or even just a configuration change. This is also noticed by some of the hyperscalers, with especially Google and Amazon leaning towards more of a ecosystem approach.
LLM Agents
I’ve previously written an introduction to LLM agents, which still serves well as a good first read on the subject. However, less than three months after writing that blog, traction for agents has drastically increased, with agents now the primary way of operationalizing LLMs. For an encyclopedia definition of an LLM agent, they’re advanced AI systems that use LLMs as their central reasoning engine. As core features, agents often have a memory and a set of tools, and can have programmatic inputs and outputs. A useful generalization of the concept of LLM agents is that they convert LLMs from text predictors to functional software components.
LLM agent tooling comes in many layers:
1) tools for agent development (e.g. LangChain)
2) agentic platforms (e.g. AutoGen)
3) fully functional ready-made software agents (e.g. Devin by Cognition AI).
The product in layer 1 is practically always horizontal, and layer 3 is always focused on a single or few verticals. However, in layer 2 we expect both horizontal and vertical solutions, with vertical agentic platforms offering increased abstraction and opinionated solutions that speed up the development of vertical-specific use cases. Even though the current discussion is focused on LLM agents, there doesn’t seem to be any fundamental blocker for SLM agents for e.g. edge and industrial use cases.
LLM agents are starting to replace more mundane activities in the short-to-medium term, with the potential to automate activities that could free up to 70% of work hours. The starting point is within the most isolated tasks with the least requirement for context, but we can expect LLMs to expand their capabilities in terms of context and discretion. However, there will still be role for humans when a high amount of context and discretion is required. Even though doubts have been expressed that AI would destroy jobs, there is a good reason to believe that this won’t be different than earlier productivity shifts, with the human part evolving into something more interesting and valuable.
Conclusion
We at NGP Capital are actively tracking developments around Generative AI. Compared to the LLM model layer, which is very capital intensive, the Generative AI market in general is evolving to a more diversified direction that is aligned with our investment thesis and allows for impactful investments.
- Small Language Models (SLMs) are gaining traction, offering a balance between performance and efficiency, ideal for high-volume, repetitive tasks and for the edge.
- Generative AI ecosystem is becoming more open, with a significant move toward open-source language models, increasing confidentiality, customizability, and control in AI development.
- LLM Agents have become the primary way of operationalizing LLMs, with their advanced features such as memory and tools, set to automate up to 70% of work hours for knowledge workers.
- Sustainability is a growing consideration, with SLMs offering more environmentally friendly options due to their lower computational requirements.
These key developments together point towards a future where Generative AI is increasingly used at the edge, whether as a part of a robot, in a factory, or in orbit.