There are several instances where fine-tuned models based on the foundational models have exhibited severe biases.
Engaging in dynamic conversations with AI tools like ChatGPT or Gemini was considered a futuristic fantasy just a little while ago. But it is the reality of today, and millions worldwide are experiencing such interactions first-hand, leading to a surge in their popularity. You may have experienced it, but have you ever contemplated what gives these chatbots their remarkably human-like attributes? That’s where “Foundational Models” come into the picture—the rising sun of the domain of Artificial Intelligence (AI). This development is a seismic shift akin to the invention of telephonic communications, revolutionizing how the human world interacts with computers and digital systems.
Nevertheless, with every technological stride comes a blend of advantages and challenges. We are embarking on a two-part article series to explore this nuanced dynamic. Our first installment will discuss Foundational Models, their essence, and their potential societal impact. In the next part, we will discuss collective actions and strategies that societies and nations can adopt to adeptly navigate this rapidly evolving landscape.
AI REVOLUTION
Foundational models are AI models (in other words, computer programs) built using massive and diverse data (information) such that they can be applied across a wide range of use cases. They are a source of general knowledge in the AI world and have expertise in several domains. Using them in real life feels like talking to someone proficient in many languages, knows all the history, has read everything on the internet, is a great painter, and has a great understanding of scientific topics. They are versatile and can be fine-tuned (tailored) for specific tasks. One famous example is Language Models, which are trained on massive amounts of text and are fine-tuned for tasks like question-answering, language translation, and text summarization. Another example is Computer Vision Models, which are built using a vast collection of images. They are fine-tuned for image classification, object detection, and image generation.
Fine-tuned foundational models are getting widespread acceptance and popularity. Tech-savvy people have welcomed them with wide arms. They are adapting them into their daily life, such as making a travel plan, writing emails at different levels of formality, learning summaries of major historical events, and to the extent that students get help from these models for their homework. Users of these fine-tuned models generate art (images, videos, sceneries, portraits) creatively just by simply describing what they want the art to be. Such use cases challenge fundamental notions of creativity and originality. Further, with their unparalleled ability to process and interpret data, these models can provide invaluable insights that inform strategic decisions in day-to-day life, education, business, healthcare, finance, government, and beyond. Furthermore, these models may seamlessly integrate into various systems from the backend, operating automatically, e.g., healthcare triaging, financial advisory, legal assistance, and others. As a result, users like us may need to be mindful of the extent to which these models influence the system. As foundational models become more ubiquitous, their influence will become apparent in how we communicate and consume information—eventually shaping human interactions and societal norms in unprecedented ways. How this technology can influence the cultural landscape and societal discourse is worth considering.
CHALLENGES WITH THIS NEW INNOVATION
Foundational models are compelling, but training (building) them is a massive and formidable undertaking. First, the required data includes the entire open internet and private enterprise data of big corporations. Next, sourcing the data needed and building the computational infrastructure to train, test, and deploy the model takes a significant financial resource. OpenAI’s CEO Sam Altman said training GPT-4 (one of the leading language models) took more than $100 million (Rs 800 Cr). As a result of these prerequisites, the mainstream foundational models are all developed by multinational giants like GPT-4 (OpenAI and Microsoft), LLaMa (Meta), BERT (Google), and others. According to some estimates, the amount of data on which GPT-4 was trained can be compared to a 650 km long line of library shelves full of books. Its computing power is equivalent to running a mid-sized laptop for 7 million (70 lakhs) years [ref]. Several estimates say that the cost of training GPT-5 (successor of GPT-4) could be up to $2.5 billion (Rs 20,000 cr, which is ~20% of India’s annual budget for education). Such high costs will restrict access to developing powerful AI tools to only the largest and most financially equipped organizations.
As corporations are racing to make the most efficient and sellable Foundational Models, fundamental issues are coming to the fore, showing the risks associated with these models’ widespread use and acceptance. First and foremost among them is the problem of embedded bias. This series of tweets shows the responses by a Large Language Model developed by an MNC on questions about whether or not heads of state of some countries were fascist. Surprisingly, the model gives biased answers in the Indian context and offers a neutral response in others. It is worth pondering about the source of this bias. Is the bias based on how the model is built/trained, the source of data, or who is training the model? A similar Computer Vision model was widely criticized for generating biased responses to factually straightforward questions. There are several other instances where fine-tuned models based on the foundational models have exhibited severe biases. Furthermore, control of these tools by the big corporations begs the question: What can we do to ensure that these models are accurate, grounded in facts, free from ideological biases, and not presenting a fictional reality?
We leave you with this question to ponder. In our next article, we will discuss how societies and nations adapt (and can adapt) to this quickly shifting terrain.
Akshay Jajoo graduated from IIT Guwahati with a B.Tech in Computer Science with a Gold medal and a Ph.D in CS from Purdue. Ksheera Sagar received his Integrated M.Sc in Mathematics & Computing from IIT Kharagpur and a Ph.D. in Statistics from Purdue. Both are currently working in leading industries as researchers. The views in the article are the author’s personal opinion. Authors also want to thank their friends, Anil, Avatans, Apoorva, Nimayi and Prateek for their feedback.