In 1983, the Soviet Union’s Oko early-warning system issued a critical alert, signalling an imminent nuclear strike from the US. The system, based on satellite data and algorithmic analysis, had malfunctioned, misinterpreting sunlight reflections on high-altitude clouds as missile launches. The officer on duty, Stanislav Petrov, faced a critical dilemma: to trust the seemingly precise output or rely on human intuition shaped by broader context and uncertainty. He chose the latter, averting a nuclear catastrophe.
This moment serves as a haunting precursor to the challenges we now face with AI. It highlights the philosophical question of epistemic reliability: How do we ensure that machine-generated knowledge aligns with truth in high-stakes scenarios?
The electronics and IT ministry recently organised a consultation to establish the India AI Safety Institute, reflecting global efforts to address the multifaceted challenges posed by advanced AI technologies. The US, UK, European Union, Japan, Singapore, South Korea, Canada, France, Kenya and Australia have already established institutes to evaluate AI systems, conduct adversarial testing, and develop methodologies for mitigating risks such as bias, manipulation and unintended behaviour. However, they should also look at the ethical questions.
An AI safety institute should look at epistemological and ethical dimensions of decision-making. What does it mean for an AI system to “understand” risk? How can it differentiate between signal and noise in contexts it has not been explicitly trained for? And how do we embed systems with the humility to defer when certainty is an illusion? These questions lie at the intersection of philosophy, ethics and systems design, defining the very essence of safe AI.
AI systems are built on probabilistic models, programmed to infer conclusions from data patterns. However, their epistemic framework is inherently narrow, confined to the parameters of their training and the assumptions embedded by their designers. In contrast, human judgement often draws on tacit knowledge, an intuitive synthesis of experience, context and uncertainty. This highlights the philosophical distinction between computing and knowing.
For an AI safety institute, this raises a crucial question: can machines ever possess an epistemic framework broad enough to account for unquantifiable uncertainties? If not, how do we design systems that recognise the limits of their knowledge, akin to Petrov’s decision to distrust the system when faced with conflicting signals?
The Oko incident also highlights the ethical question of deference. Machines act on the basis of pre-determined thresholds for action set by programmers but lack the moral capacity to evaluate the stakes of their decisions in human terms. This leads to a broader question: should AI systems always defer to humans in critical scenarios, or should they act autonomously when speed is of essence? This is also a concern with lethal autonomous weapon systems.
The philosophical tradition of virtue ethics offers one way to think about this. Aristotle argued virtuous action depends on phronesis—practical wisdom rooted in moral character. An AI, no matter how advanced, cannot possess this, raising the risk of decisions devoid of ethical nuance. An AI safety institute must therefore grapple with the design of systems that can incorporate ethical constraints while remaining operationally effective. Can machines be designed to “know” when to stop, seek human input, or even refuse to act?
The Petrov dilemma also invites us to question the ontology of agency in machines. Do we treat AI systems as independent agents capable of making decisions, or merely as extensions of human intent? The former implies a need to grant them some level of moral accountability, while the latter suggests humans must always bear ultimate responsibility for their actions. Yet, as AI systems grow more complex, the boundaries blur. The safety institute must tackle these ambiguities, developing frameworks that address the paradox of accountability.
At a deeper level, the 1983 incident underscores the tension between certainty and uncertainty in decision-making systems. Modern AI systems are designed to operate in environments of calculable risk, but struggle in situations of radical uncertainty, where unknowns cannot be parameterised. The philosopher John Rawls introduced the concept of the “veil of ignorance” to address fairness in human decision-making. A similar principle could be applied to AI safety: how do we design systems that make decisions as if they are unaware of their own biases and limitations, ensuring a level of ‘humility’ in their outputs?
A critical dimension is the challenge of value pluralism, which arises from the incommensurability of human values embedded in decision-making contexts. Unlike humans, whose judgements are shaped by competing priorities such as equity, efficiency, and cultural norms, AI systems operate on predefined optimisation criteria that may fail in scenarios requiring moral trade-offs. For instance, a self-driving car confronted with an unavoidable accident must “decide” between minimising overall casualties and prioritising its passengers.
Drawing on Isaiah Berlin’s concept of value pluralism, it is clear that no universal framework, whether utilitarianism, deontology, or situational ethics, can resolve all such conflicts. This necessitates the development of adaptive systems capable of integrating multiple value paradigms while recognising context-specific nuances. Moreover, encoding such values into AI systems requires participatory governance mechanisms to prevent the privileging of dominant or exclusionary perspectives.
By engaging policymakers, ethicists, technologists and civil society, AI safety institutes must establish deliberative frameworks to ensure inclusive and operationally feasible ethical alignment. AI risks perpetuating systemic biases or making ethically indefensible decisions in high-stakes environments without these safeguards.
(Views are personal)
(On X @adityasinha004)
Aditya Sinha | Public policy professional