Exploring the Modern AI Paradigm

Introducing the Modern Artificial Intelligence Primer

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] Who's going to benefit the most, Catherine, from using this primer and deploying the insights from within it? I think there's probably three different types of audiences that can really gain a lot. The first would be naturally government leaders, that are really on the pinnacle of bringing and integrating AI in, but they want to have a much more accelerated understanding. I kind of see our primer as a little bit of a boot camp in a box or a boot camp in a PDF, where you can get dangerously smart reading the content of this particular primer. The second type of audience would be actual practitioners like Alison and Ed and myself. There's a lot of goodness in the language and in the content that can take you from a journey of some basic 101 to getting really in the weeds and understanding different components like optimization, loss functions, a little bit of reinforcement learning. With that, I want to talk and turn to, looking back at that original primer and maybe get a perspective, Ed, from you, on the similarities and differences that have unfolded and how we have evolved the document over those years. Yeah, I think a lot of it aligns with the larger use of AI and machine learning today, where before, the application was more  narrow, more scoped, and required more of sort of a cookie cutter, like “It's got to fit the mold, and if it doesn't fit the mold, it's not going to work.” Just standard classification, regression, some clustering. The scope of what machine learning and AI techniques are applicable to now is so much wider than it was before. With generative AI, and really more broadly,  newer AI techniques today, there's a lot more flexibility in the types of problems you can tackle. And that's what's changed in the primer, is covering that sort of breadth of scope and possibilities that you can do today that weren't really viable before. So there's a lot of technical detail in the primer, which leads me to ask, how technical does one need to be to really access this and gain insight from it? And so, Alison, maybe I could turn it to you and ask you, how technical do you have to be, actually? Definitely not. You don't even need to be an engineer to use generative AI effectively. In fact, you know, anyone can open up a browser, navigate to Anthropic’s Claude or OpenAI's ChatGPT and interact with generative AI today. I mean, I'm pretty sure my parents could do it, which is, you know, telling. That said, it's still really important that people become more familiar with its capabilities and have a general understanding. And I think there are a number of reasons why that extra knowledge and expertise is needed. And again, not technical, but just a little bit more skill is, 1 -you can improve your outputs dramatically if you understand how the model is working. That's really important for your efficiency and quality of output. 2 - you can use it more responsibly if you're aware of its limitations. So for example, hallucinations. If you're aware that hallucinations might occur in a large language model, you're more likely to be critical of its output. And then lastly, and this is kind of a nuanced thought, but if you know the limitations, you have better expectations and user experience overall and you know the right tasks to give it. [ decoder noise ] 

Learning from the History of AI

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] So what can we learn from studying AI history? You know, for many, it's kind of surprising how far back AI goes. And I was really surprised by the depth of the history that you brought forward in the primer and how important it was to sort of telling about where we are today in this AI journey. This is actually a journey of iterative improvement over time that brought us to something like a GPT. So I think that there is probably three different misconceptions. The first one is that many of the advanced AI that you see today is really built on the foundation of pretty basic foundational math, like optimization models, for example, search and planning algorithms. And that these iterative improvements have really been able, in addition to things like Ed had mentioned, like accelerated compute and volume of data, to allow us to do the type of advanced generative AI that we do today. I think the second misconception, is that it takes a special type of knowledge, special type of skill, special type of mathematical engineering background to be able to fathom all of this information. And as Alison had mentioned it, actually there are many legs to it and there are many skill sets that are brought here. Some of it's not even math or quantitative. Something pertains to things like sociology and philosophy, and things pertaining to communications. And then I think that the third misconception is to think that we're not going to be improving any more and that maybe this is the nexus of where we have reached in terms of AI. Well, I think the key thing is there are sort of peaks and valleys in this whole process. Ed, maybe you could talk a little bit about this, right? We've been through the AI winters. You know, maybe talk a little bit about that and where we are today. Yeah, for people who aren't aware of the  AI winters are basically these periods of time where  the community got really excited about AI and its potential. A lot of funding went into it and expectations did not meet reality. Funding dried up. Some stubborn people sort of kept going on. And then there's some new breakthroughs, some new excitement. We pop back up again and again. Sort of expectations didn't match. And the expectations not matching reality is the key point, because there was still a lot of value each and every time that this happened. There were real things being accomplished. One of the first, like AI projects that DARPA funded saved the government, hundreds of millions of dollars. Coming off of that conversation about the hype cycle. Alison, maybe we can talk a little bit about sort of where we are in the hype cycle with generative AI. Looking at the hype cycle, we have an emerging technology, a breakthrough technology, and there's this exuberance, you know, exuberance around the potential of it. And then a subsequent crash of disillusionment and then a more measured, you know, optimistic, “okay, we can do something,” which eventually plateaus to a mature technology. And the hype cycle is largely around people's expectations, but also this implication that companies and people are overstating the capabilities of generative AI. And I don't think that tracks today. I don't think we're overstating it. And what we're actually seeing is more of an economic phenomenon where companies are investing left of the curve. And we've heard Mark Zuckerberg talk about this with Meta, where he basically said, I'd rather be a little early than completely miss the boat. And so we're seeing a ton of investment which is generating that buzz, which is different from overstating capabilities, I think. [ decoder noise ] 

Exploring the Technologies Behind AI

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] Given the significant escalating costs that you're seeing in this generative world, with foundational model development and the fact that this is so pricey now, are we sort of pricing elements out of it? And how do we how do we deal with those sort of risks? Epoch AI recently released a report where they were estimating not only the cost, but the feasibility of continuing to scale these frontier models at its current rate by 2030. And what they found from a technical feasibility standpoint - so they're looking at things like chip availability and performance,  latency, even power grid availability - they found that it was feasible to continue maintaining this rate of scaling. However, they estimated the cost to be hundreds of billions of dollars for that next frontier model in 2030. And already you're thinking like these cash-rich tech companies, a $100 billion bill is already going to be a lot for them. Can they stomach it? And so I think the economics are actually going to play a pretty big role there, which, as you're mentioning, could be a barrier to entry for many smaller organizations that are not as well funded. Let's pivot a little bit and talk about, in the Primer, one of the other things I thought that was really interesting was all the different amalgamations of technologies that have to come together to make this successful, right? And it's algorithms to chips, you talked about GPUs, CPUs. And so I think, you know, Catherine, maybe I'll turn to you and say whether it's transformers, attention, GPUs, it's hard to choose our favorites, but what do you think is sort of really behind some of the radical change in acceleration of AI right now? Accelerated compute and GPUs, TPUs, Alison mentioned some next generation chips, these are really vital and critical when it comes to training these models. These are very hungry models that take quite a long time to train. And when we look at things like GANS, especially diffusion models, GPT models, these have a lot of parameters, meaning that there's a lot of weights, there's a lot of data, there's a lot of information to learn. So as a result, we can't use just a small GPU card. We have to accelerate. And as a result there's a lot of different type of computation. Not one GPU card fits them all. There are many different sizes, configurations, and this type of infrastructure is incredibly important to be able to train these algorithms, either from scratch or to be able to do fine tuning, which would update the model based on new data. So, Ed, let me go to you. What do you think are the most significant technical innovations in this space right now? Yeah, I think to add on to what Catherine was saying, there's also strategies that we've been developing inside of deep learning that have made a big difference in what we've achieved. So, like self self-supervision is one of them where it’s like, how can I find some way to get labels for free? If I want to build a cat detector, okay, I've got to get a million pictures of cats. I've got to get someone to mark all of them as cats. I also have to get a million pictures of other things and mark what they are. And that's a large amount of manual effort. And that's the way we did it originally. And we realized, actually, if I maybe block out one part of the image and say, predict one half of the image from the half I give you that doesn't require anyone to do any work, I can just automate that. But it's a good enough sort of self-supervised label that we can build one of these backbones and then use a small amount of manually labeled data to build our final solution and have it work almost as well, or sometimes even better than going and manually labeling a million images. And different strategies like this, where we're thinking of how to frame problems differently are a big component to the successes we've seen. [ decoder noise ] 

What Sets Large Language Models and Generative AI Apart?

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] Catherine, could you start perhaps explaining some of the difference between maybe the legacy traditional core AI and this emergent generative AI? This path going from traditional machine learning, supervised, unsupervised, and those concepts to neural networks and some of these more kind of rote,  dareI say, types of AI today to where we see as gen AI was  not so much a huge moonshot as it was iterative improvements. So everything that we see in generative AI algorithms today, the way we train them, the way you fine tune them, the way we build them, they're all based on fundamentals that we know, like loss functions, optimization and transformers to models like BERT kind of really changed the game and being able to retain memory and information, those really kind of set the foundation for what we now call foundational models. These are large transformer architectures like GPTs that in and of themselves now, because of the adoption of gen AI, they form the backbone of many, many, variations of different types of gen AI. So, Ed, let's go to you and let's pull the thread on that. And let's talk a little bit about the landscape  of these large language models today Can you give us a perspective on where we are? Yeah, so as Catherine was mentioning, there's a lot of different architectures available as sort of an initial backbone to build on and attached to. And that could be just using the architecture itself and training it from scratch on your own data, that can be using the pre-trained model that's been released and fine tuning it to your particular data set or problem. There's another part that also ties into what Alison mentioned earlier is the tokenizers. There's different tokenizers that are used by different models that allow or enable certain things more easily, like counting and how many R’s are there in “strawberry?” Or literally just counting numbers and being able to accurately count from 1 to 5 if you're going to do some numerical task. So there's choices there to be made, that affects its performance on your problem. Do you have the compute to run it? If you have - Llama’s I think the new big one, is 400 billion parameters. Like, okay, you're going to need some muscle, some real compute if you're going to run that. If you want to run it on your phone, okay, that's not an option. I need to pick something smaller. So there's a lot of available options for you to sort of pick from a Pareto frontier of what's important to me, and what do I need in order to get my sort of problem solved, including all these sort of online API options as well. So Alison, let's talk a little bit about the risks around generative. With the executive order, OMB guidance, NIST the AI risk management framework, there's a lot of concerns about these generative models. And in particular, you know, we've deployed many of these to our federal clients. And I'm kind of curious about our empirical lessons. What have we learned? How have we sort of thought about that risk challenge and how do we mitigate those risks as you look towards adoption? I think the risk is really dependent on the use case, and we can't apply a one-size-fits-all kind of solution. And I think going back to that, hallucinations, there are some instances where hallucinations don't really matter because you're not looking for factual outputs. You're looking for creativity, brainstorming, and certain cases where you're not going to be optimizing on reducing hallucinations. But when you're looking at, for example, policies and trying to really understand how to implement a policy correctly, you want to be as accurate as possible, in which case for those clients, we are doing model steering, doing retrieval, augmented generation, or RAG, patterns in order to further ground the data. And so there are other technical techniques that go beyond the standard framework for responsible practices, that I think are really important in managing risk. [ decoder noise ] 

Key Takeaways for Federal Leaders

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] As we think about it, what are the key takeaways? What should federal leaders be thinking about here? And I'm kind of curious about can we continue to leverage the sort of dual use applications of large language models? Can we harness the power of these models that are being invested and built? Alison, you talked about the economics of this, right? These are expensive. We are talking hundreds of millions of dollars in instances for some of these. Or does the government, the federal government, need to build their own large language model tailored to their applications and use? What are your thoughts there? I think what we're presenting them again to cut through the hype is that it's much more than buying an algorithm or connecting to an API. You had mentioned LLMs for the government. There's a lot of very fine classified tasks and mission use cases that make it not so easy to just bring in out of the box any LLM. There's also the important element to realize that, again, it’s not just the architecture, but there's also a very serious conversation to have about infrastructure compute. And we touched upon it a little bit, but AI security. So there are many kind of puzzle pieces here. When it comes to our government clients using AI in the best way they can. So, Ed, could you talk a little bit about adversarial AI, secure AI, and what does a world look like in which AI becomes more ubiquitous? And what does that mean from a security standpoint? Yeah, so adversarial machine learning is, at its simplest, using machine learning techniques to subvert someone else's algorithm. And as you sort of keep pushing it, you realize there's more and more cases where there might be someone who is motivated, capable, and willing to apply those techniques to you and your problem. And so there's a lot of threat modeling here, like figuring out like, okay, what's the reality of who's going to attack me? What resources do they need to do it and how? And there may be cases where there's techniques to building more robust algorithms, but there's no techniques for building perfect algorithms. So you have to do some trade off and analysis on like, okay, if something goes wrong, what processes do I have to remediate that error? Is that good enough? Is that safe enough? Alison, let me turn to you. So as we think about like, so we need to secure these models, we also have to ensure, I think, you know, in government, transparency and trust are really important. So as government thinks about adopting this, how do federal leaders, or what role the federal leaders have in helping ensure that there's trust in the application of these models. I think the first thing is establishing really clear regulations and policies. I think we're starting to move in that direction, but something even more tangible that everyone can understand and interpret so that they know exactly the scope of these systems. I think the second thing is obviously responsible AI guardrails. Knowing that it's a requirement and knowing what they entail will be really helpful and engendering more trust across the general public. As we think a little bit more about what I was saying earlier, you don't have to be an engineer, but you do need to have or acquire new skills and knowledge for this new technology. I think the government does need to have some education and AI literacy essentially for the general public so that they understand how it's going to impact their life and how it works. And then lastly, and this is more thinking about funding and the types of research that the government can fund in order to actually foster more trust, is in more AI security and AI safety, and things like that. [ decoder noise ] 

What’s Next for AI

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] We'll turn and we'll talk maybe about some of the biggest unknowns as we think about sort of where I is headed next. Kind of curious for your thoughts on how you expect generative AI to evolve over maybe the next 3 to 5 years, and sort of what should we anticipate or expect? The first thing that I can say with some accuracy, I think, is we're going to see a consolidation of the landscape. There are so many companies out there right now that are doing various components within the generative AI stack, and I think we're very quickly going to see a consolidation of them. In that same vein, I think we're going to see companies become more vertically integrated. And what I mean by that is, hardware providers like Nvidia are already experimenting with software layers, and software companies are now thinking about creating their own hardware or acquiring hardware companies. And then the last thing I want to say, and maybe this is wishful thinking, but I strongly believe there will eventually be a shift away from the focus on training models. I think it's a great exercise and it is great headlines, but what we're going to see over time is this need for more customization and tailoring for those specific use cases, and hopefully a little bit more reorientation towards those pragmatic downstream applications. Ed, let's talk a little bit about kind of maybe what the future holds in terms of unexpected breakthroughs, like what do you anticipate? You know, if you're a betting person, you got 100 chips, what are you betting on? I think something that a lot of people aren't fully aware of, especially people who aren't in-the-weeds technical is the software frameworks and tooling that have been developed as the foundation to build things on top of has gotten exceptionally good. And there's a lot of things that exist, especially in the government sphere, that are implemented in COBOL or Fortran. There's a lot of old code that we don't like to touch, because it scares us to touch it, because it's written in things that, like our parents wrote and it works. But you pay a debt when you don't improve your old code. We're reaching a point where it’s a very good time to rewrite a lot of that in modern deep learning frameworks, because it will be easier to maintain, because it will get you more flexibility. So there's a huge amount of potential there that is just waiting for people to realize and say like, yeah, I'm going to take the coins put on the, put the bet down and say, you two go work together and merge these things. I don't know what things I'll predict, but I think that's where a lot of really huge wins are going to come from. So that's exciting. So kind of retirement of technical debt, right? Catherine, I'm going to give you this softball question to wrap up with here. Maybe, you know, as we think about sort of the anthropomorphization of AI, gen AI, there's obviously a lot of talk about artificial general intelligence, right? The rise of machines, if you will, as sentient beings. Could you describe or define what that maybe means and perhaps also prognosticate on how far are we from that? I think to prognosticate a little bit that, AGI is something that is not only in the realm of ChatGPT. Just because an LLM is really good at reasoning, doing some agentic workflows, to orchestrate tasks, to chat with you and give you facts, it is not a direct indicator of sentience. When Kurzweil and Vernor Vinge talk about singularity, they're talking about a convergence of multiple types of reasoning across multiple types of AI. We didn't even get a chance to really talk about multimodal AI or multitask AI. So, you know, from my standpoint I think that there is definitely a strong acceleration. We are in, we cannot get out of that magnetic field. Right? We are thrust into this acceleration of more and more AI. It'll be more integrated into our lives. But we really, really have to be thoughtful about believing and, calling out that an LLM is doing things that are AGI and sentient. [ decoder noise ] 

Unlocking AI’s Potential to Benefit Humanity

Click Expand + to view the video transcript

[ ♪ exciting music ♪ ] So we talk about how humans built AI, they're going to power it, and we're going to thrive from it. Alison, can you talk a little bit about your vision of sort of how we are going to unlock this potential and how we will thrive from this? I don't even see how we can't not thrive from it. Honestly, if you look at all these technology breakthroughs and you know, going back to what Catherine was mentioning about, you know, is it going to take our job and stuff? And my answer is vehemently like, no. And and it's because if you look at things like the incandescent light bulb and the creation of LED light bulbs, everyone is like, oh, we're going to save so much electricity, we're going to use so much less because we have this efficient light bulb. And they didn't find that to be the case at all. In fact, everyone just used more light, right? And so I think that's exactly what we're going to see across the board with generative AI. And going back to how it can empower people, I think we're really going to thrive from it. I use it every day personally at my job at Booz Allen and just personally. And I find it to be incredibly compelling in all aspects of my life. And I can't wait to see once it gets out of that chat realm and more embedded into these applications, how much, better people's lives will become because of it. Ed, how about you? How do you think we may thrive from this? I've gotten to do things where I've worked with physicians. I've gotten to do things where I work with climatologists. I got to do things where I'm helping someone on, some quantum computing research where they have some algorithm, they want to study it, and they're like, oh, this might take a year for us to run. And I rewrote it using JAX and some other sort of deep learning frameworks. We were able to bring the simulation time down to 22 minutes. And now it's like, oh, my God, we have our answers. Now, this is not a like, year long thing away. I have it today. And to me, that's what's exciting about all of this, is that we're making it easier to solve more problems and to enable more people to solve problems, and to build those tools that they can use to go forth and figure out whatever they need to figure out whether it's how does this disease propagate? How does this medication help people? How do we build new encryption schemes that are going to work in 50 years? To me, that is like the coolest thing in the world, and I don't understand why other people don't see that and that we're enabling everyone to be sort of this kind of builder. Catherine, can I ask you the same question? I think that AI is really allowing us to innovate faster than we have before. I mean, we've got a variety of tools. As Ed was saying they get better and better. We also have a lot of talent, a lot of people who, at least in my time, when I was an undergrad, computer science was very, kind of rigid. The computer science classes that I took were really about object oriented programing and data structures and some compilers. But nowadays the aperture to do computer science has really attracted a lot of different types of students and learners. So economists, for example, social scientists, biologists, different physicists, pretty much every element of every type of learning discipline has converged upon AI to use and pick and choose the algorithms that make sense for them. And I think that's really exciting. I don't believe at least, since I've been in school, that I've seen this kind of symposia of people and learning and desire to infuse it with one discipline like I. [ decoder noise ] 

Download the Modern Artificial Intelligence Primer for the full story, and watch the entire conversation on YouTube.  

Meet the Panelists

  • Dr. Catherine Ordun, Ph.D., is a Booz Allen vice president responsible for rapid AI prototyping and multimodal AI research.
  • Dr. Edward Raff, Ph.D., is Booz Allen’s director of emerging AI, where he leads our AI research team.
  • Alison Smith is our director of generative AI and leads solution strategy and investments around large language models. 
  • John Larson (moderator) is a Booz Allen executive vice president and leads our AI practice.

Connect with one of our AI experts: