أسلوب الحياة

Cuneiform: How AI is revealing the secrets within the world’s oldest texts

BEHIND a locked door in the British Museum, London, there is a beautiful library with high, arched ceilings. Inside this secret room, Irving Finkel opens a drawer and pulls out a clay tablet. Cracked and burnt, it is imprinted with the tiny characters of the world’s oldest written language. It is a list of omens. Another drawer reveals another tablet. “This is a prayer to the god Marduk,” says Finkel, who is assistant keeper of ancient Mesopotamian script, languages and cultures at the museum, and one of only a handful of people in the world who can read this long-dead script, known as cuneiform, fluently.

Behind us, a photographer is meticulously capturing images of this writing, with lights positioned to highlight the indented etchings. This work is part of a revolution, one that is using today’s computing power to bring this 5000-year-old record back to life and unlock new secrets of the world’s first civilisation.

Although this system of writing was deciphered 165 years ago (See “Reading the signs“), the majority of texts that use it have never been translated into modern languages – a fiendishly complicated task that relies on experts such as Finkel. Now, thanks to developments in artificial intelligence, computers are being trained to read and translate cuneiform, to put fragmented tablets back together to recreate ancient libraries and even predict bits of missing text. These tools are enabling the earliest works of literature to be read in full for the first time since antiquity, giving insights into stories that later appeared in the Bible and shedding light on civilisations at the dawn of history.

The story of cuneiform begins around 6000 years ago in Mesopotamia, the fertile region between the Tigris and Euphrates rivers that is now Iraq, when there was a shift from living in small agricultural settlements to large urban centres. Here, the Sumerian people built the first city states. Uruk was one of the most important. With temple complexes and a canal system, it was home to up to 50,000 people by 3000 BC, and was the administrative hub for the region with a bureaucracy to manage the complex system of labour that had developed.

Although these people spoke a language (Sumerian) that is completely different from any other that we know of, and has long since died out, we have an incredible record of their lives because, as far as we know, it is here that writing originated. It was made by pressing the end of a reed into moist clay to make wedge-like shapes, giving this script its modern name: cuneiform, from the Latin cuneus, which means wedge.

Although we now associate writing with poetry and literature, this early example was nothing of the sort. It was used solely for administrative purposes, to keep track of the transfer of slaves, for instance, or the receipt of animals. A typical example shown to me at the British Museum is a record of rations of beer, with a drawing of a jar denoting the beer, a person’s head and circles to signify the amount. Soon, these pictographic signs evolved to become increasingly abstract.

It took a long time for cuneiform to shift from record-keeping to a tool for linguistic expression. The first royal inscriptions appeared around 2700 BC, and the earliest literary texts a hundred or so years later. One of the first known authors was Enheduanna, a princess, priestess and poet who lived around 4300 years ago. She wrote many hymns and the myth of Inanna and Ebih, which recounts a conflict between a goddess and a mountain. The most famous text of all is the Epic of Gilgamesh, about a king’s quest for eternal life, and includes a section that appears to be a precursor to the biblical story of the flood.

The impact of Sumerian culture still ripples through our lives today, not only through our biblical stories, but in our clocks. Their sexagesimal counting system, with a base of 60, is the reason why we have 60 seconds in a minute and 360 degrees in a circle.

A depiction of the cuneiform poet Enheduanna

Hoberman Publishing/Alamy

Ancient letters

Cuneiform itself isn’t a language, but a writing system, similar to how the letters used to write English can also be used for French or German. Sumerian eventually died out. The cuneiform script, meanwhile, lived on and became the written form of many other languages, such as Akkadian, Hittite and Old Persian. It was in use for 3000 years before it, too, died out, recording the births and deaths of ancient kingdoms. We know this thanks to the clay that cuneiform was inscribed on: a cheap, readily available and durable material. “It’s fortunate for us, because any tablet that was ever written survives, unless it was thrown in the river or smashed completely,” says Finkel.

Thousands of these tablets are around today, forming a key part of the world’s cultural heritage. They are chronicles of our planet’s first great empires, as well as hymns, letters, shopping lists and even customer complaints. “People say the first half of human history is only recorded in these cuneiform tablets,” says Enrique Jiménez at Ludwig Maximilians University in Munich, Germany.

Deciphering the past

New secrets from the tablets are constantly being revealed. In 2017, a small, 3700-year-old tablet known as Plimpton 322 was revealed as the world’s oldest trigonometric table, showing that the Babylonians – Akkadian-speaking people living in central and southern Mesopotamia – not the Greeks, were the first to study trigonometry. And last year, a new analysis of a tablet excavated in Iraq in 1894 showed that the Babylonians calculated with triangles centuries before Pythagoras. Yet with only around 75 people who can read cuneiform fluently, the majority of tablets lie unread, gathering dust in the back rooms of museums.

2ABA2PM Babylonian Mathematics clay tablet, Plimpton 322, about 1800 BC. Table of four columns and 15 rows of numbers in cuneiform script. The tablet lists

Plimpton 322, a Babylonian clay tablet containing ancient mathematics


One issue is that cuneiform is incredibly complex. “The script is very ambiguous. There is no single way of writing a word,” says Jiménez. In addition, most of the tablets are incomplete. The majority of cuneiform tablets are broken, chipped or smashed to pieces. Often, the edges have crumbled away, leaving stories without beginnings or ends, or with gaps in the narrative.

This is the case for the world’s oldest surviving royal library, that of King Ashurbanipal of the Assyrian Empire. In the city of Nineveh, close to modern-day Mosul in northern Iraq, Ashurbanipal assembled a vast library of written works from across Mesopotamia. This amounted to 30,000 tablets, containing everything from rituals, medical encyclopaedias, astronomical observations and the exploits of royals. The writer H. G. Wells called it “the most precious source of historical material in the world”, but it was reduced to rubble and burned when the city was sacked in 612 BC. I see evidence of this first-hand during my trip to the British Museum, where the remnants of this library are now stored, their blackened scorch marks still visible.

Piecing these fragments together is like assembling a number of complex jigsaw puzzles whose pieces have become jumbled up, with no picture on the boxes to tell you what to aim for, says Jiménez. What’s more, fragments from the same tablet can be scattered around the world. “There’s a tablet where there’s a piece in Chicago, which joins a piece in Berlin and a piece here,” says Finkel. Putting the puzzle back together is a painstaking process that relies on luck and memory. It took more than 100 years to identify the beginning of the Epic of Gilgamesh in a small fragment stored in a museum drawer, for instance. But now computers are involved, things are changing.

The Fragmentarium, part of the Electronic Babylonian Literature project, set up by Jiménez in 2018, is using AI to reassemble Ashurbanipal’s library and other great collections written in cuneiform by working out which fragments belong together. To do this, Jiménez is using algorithms developed to compare different variants of gene sequences, based on the fact that there are often multiple copies of the same text with minor variations. The AI can be trained on transliterations of these texts, in which cuneiform characters have been written in the Latin alphabet according to the way they sound (in the same way that Chinese characters can be written in Pinyin, their Mandarin pronunciation). The AI can then predict which cuneiform signs are likely to be in the missing segments. It can also search for a particular cuneiform sign in a huge database of fragments.

Wall relief from Mesopotamia, Assyrian image of Ashurbanipal lion hunt, detail. Babylonian and Sumerian history, remains of culture and art of ancient Middle East civilization. Iraq and Sumer theme.; Shutterstock ID 1503333644; purchase_order: -; job: -; client: -; other: -

A wall relief of King Ashurbanipal’s lion hunt

Shutterstock/Viacheslav Lopatin

In 2019, this approach assisted with the identification of several missing pieces of the Epic of Gilgamesh, as well as revealing a new genre of ancient literature: a text consisting of parodies (including jokes about donkey dung) that was used by school children to help them learn to write. And together with Anmar Fadhil at the University of Baghdad in Iraq, Jiménez is also piecing together another previously unknown genre, a hymn to a city, in this case the city of Babylon, featuring details of temple life and cultic prostitutes.

Then last year, in the world’s first fully autonomous cuneiform fragment identification using AI, a missing piece of the famous Poem of the Righteous Sufferer (which explores the question of why bad things happen to good people, and seems to be a precursor to the biblical Book of Job) was identified. “Humans would have missed this,” says Jiménez.

Other researchers have turned their attention to the seemingly mundane administrative tablets. “There’s a multitude of small receipt texts – written traces of transactions that occur between different institutions such as temples or the palace of the local ruler, or among individuals such as merchants,” says Émilie Pagé-Perron at the University of Oxford. Collectively, they hold a wealth of information that cuts to the heart of the ancient civilisations of Mesopotamia. Sumerian texts, for instance, often contain the name of individuals and dates, meaning it is possible to trace a person’s role in society. For instance, 80 tablets known as the Mama-ummi archive, dating from around 2300 BC, show that a female supervisor named Mama-ummi was in charge of a team of 180 weavers, and that there were surprisingly varied job opportunities for women at the time.

To help wade through this sea of administrative information, the Machine Translation and Automated Analysis of Cuneiform Languages project was set up in 2017 by Heather Baker at the University of Toronto, and coordinated by Pagé-Perron. In the most recent experiments, different algorithms trained on 45,500 transliterated phrases, each consisting of up to 19 words, were tested for their ability to translate Sumerian words into English. Results published last year show that one particular algorithm could translate with an accuracy of 95 per cent. The system also pulls out key information from the texts, identifying categories such as people, places and gods.

Last year, computer scientist Gabriel Stanovsky at the Hebrew University of Jerusalem and his colleagues found a way to predict the text on missing parts of fragments, in a similar way to that of automatic prediction of words on mobile phones. They used a deep-learning AI, feeding it transliterations from 10,000 cuneiform tablets, written in Akkadian, and found that it could suggest contextually correct words to fill the gaps with an accuracy of 89 per cent.

Another potential application of AI is the dating of tablets whose origin is unknown. “If we know the dates for certain documents, we can train the algorithm to predict the missing dates for others,” says Stanovsky.

Deciphering cuneiform from its transliteration is one thing. Reading the cuneiform characters themselves is quite another. Not only does the cuneiform script evolve over time, but spellings vary considerably, and the script was also used for different languages at different times. “The other thing to keep in mind is we’re dealing with handwriting from individual scribes,” says Miller Prosser at the University of Chicago. On top of this, there are no gaps between signs, so it is difficult to work out which group of wedges forms a character. And whereas the Latin alphabet contains 26 letters, there are more than 900 different cuneiform signs, which can appear remarkably similar.

Despite this, computers are beginning to make inroads into reading cuneiform signs, using the same kind of computer vision systems used for text recognition. For instance, Prosser and his colleagues have trained a machine learning system called Deepscribe to detect signs on thousands of tablets from the Persepolis Fortification Archive – a trove of administrative texts written in the Elamite language from around 500 BC, found in a fortification wall. “The ability of a computer to identify the boundaries of a sign and draw a box around it is a huge achievement on its own. No one usually knows where one sign ends and the next sign begins,” says Susanne Paulus, also at the University of Chicago and part of the team that carried out this work.

Instant translation

The hope is to eventually link sign recognition systems with modern language translation systems. This would mean that we could take a picture with our phone of a tablet in a museum and get an instant read-out of what it says.

None of these efforts would be possible without large digital databases of texts to provide as much data as possible to train algorithms – so they can learn, for example, which words are likely to be written next to each other. Yet, of the half a million cuneiform texts in the world’s museums, only half have been transliterated or translated, and only around 100,000 are available digitally. Efforts such as the Cuneiform Digital Library Initiative and Electronic Babylonian Literature project are now making great strides to boost these digital archives.

“Having the tools to digitise large volumes of text brings a lot of new information and new connections to scholars,” says Shai Gordin at Ariel University, Israel. “I think the next big breakthrough will come once we can put this information in a large network of connections. This way, we can build a portfolio of the life of ancient people.”

New Scientist Default Image

World History Archive/Alamy; Dinendra Haria/Alamy

That process starts with the painstaking work of taking high-quality images of all tablets held in museums and private collections around the world. That is exactly what is going on behind me at the British Museum, where images of all 40,000 smashed pieces of Ashurbanipal’s library are being captured as part of the Electronic Babylonian Literature project. In a special desktop photo studio, photographer Alberto Giannese takes six images of each cuneiform tablet – front, back, top, bottom and sides. The text doesn’t always stop at the edges, says Giannese, and even distinguishing front from back and top from bottom can be hard.

The six images are then automatically stitched together by computer software and deciphered and translated by cuneiform experts such as Jiménez. By 2023, the entire collection of images will be available to the public. Not everyone can visit the secret back rooms of museums, but soon we will all be able to view Ashurbanipal’s library and many other long-lost cuneiform texts from the comfort of our homes.

As my visit to the arched library of the British Museum ends and I join the crowds thronging to the exhibitions, I feel blown away at what I have just experienced – travelling back in time to witness the written thoughts of people from thousands of years ago, then back to the future to see these fractured ancient texts in the process of being reassembled again and decoded, this time in the digital realm. I think Ashurbanipal would be proud.

Reading the signs

Cuneiform would probably never have been deciphered without the Behistun Inscription. This trilingual monument high up an inaccessible mountain in Iran is the cuneiform version of the Rosetta Stone, another ancient trilingual text that proved to be the key to cracking Egyptian hieroglyphics.

The Behistun Inscription was carved around 520 BC to commemorate a rebellion quashed by the Persian King Darius, with versions in three different cuneiform languages – all undeciphered at the time when Western explorers first climbed up some wobbly ladders to make copies of it in 1764.

The first of these languages to be cracked was Old Persian. From the way that names such as Darius were written, another script on the monument could be deciphered. This turned out to be Akkadian, an extinct language spoken in ancient Mesopotamia. “The Akkadian tongue gradually gave up its secrets,” says Irving Finkel at the British Museum in London. “Without that key, I don’t think anyone could have ever done it.”

From there, it was relatively straightforward to crack the original cuneiform language – Sumerian – using the many bilingual texts where scribes had written in Akkadian and underneath had added a Sumerian translation.

New Scientist audio
You can now listen to many articles – look for the headphones icon in our app newscientist.com/app


اترك تعليقاً

لن يتم نشر عنوان بريدك الإلكتروني. الحقول الإلزامية مشار إليها بـ *

زر الذهاب إلى الأعلى