Digital Classics Project
Livy's Ab Urbe Condita



Many scholars have commented on the complex and unique style and diction of Livy's Ab Urbe Condita. Quintilian, an early critic of the work, praised Livy for his "richness and fullness," or his "lactea ubertas." On the other hand, Livy has also been the subject of significant criticism for his pleonastic and redundant writing style. A notable critic, Caligula, claimed that Livy was "verborum in historia neglentem" (Sueton. Calig., 34). Contrasting critiques of Livy's historiogrpahic style have remained prevalent throughout the literature over time. As tools for analyzing classical texts have become more robust in recent years, specifically with regard to the emerging field of Digital Classics, Livy's Ab Urbe Condita presents an ideal opportunity for implementing advanced machine learning, visualization, and language processing techniques to better formulate arguments pertaining to Livy's style and diction.

Numerous attributes position Ab Urbe Condita for this unique type of computational analysis. First, Livy wrote the text over a period of forty to forty-five years. Thus, we can expect that Livy's style evolved over that time period, and in fact, similar computational scholarship has demonstrated this fact. In Moerk 1970, we note that Livy is the only one out of the nine comparable authors who demonstrated statistically significant differences across individual samples of his work based on 19 style-based features in a Chi-Square test. Thus, highly sensitive and properly trained algorithms can address these nuances in style that evolve over Livy's career.

Additionally, many authors have commented on Livy's oratorical style in his writing. Given the ornate and verbose style employed during the era of Livy's authorship, some scholars believe that his writing style reflects a refusal to fully adopt these contemporary practices. In fact, many scholars have concluded that Livy maintained a natural style with a proper balance of ornamentation, in order to keep the history itself pure. Nevertheless, this perspective exists in the presence of other forms of criticism, which assert that Livy used too little discretion or discrimination in his historical sources. Despite these widely varying interpretations, Livy's sentence structure is still more complex than that of comparable writers, such as Cicero.[1] Much of the literature attributes a special Livian fullness and sense of balance to these structures, but this style has also been construed as overly embellished purple prose.[5] In a unique comparison of Polybius to Livy's Book 21, where he details the Hannibalic Wars, Ebeling (1907)[2] highlights differences in historiographic emphasis between the two authors. Specifically, Ebeling highlights that throughout Hannibal's passage across the Alps, Livy condenses the narrative portion in exchange for dramatism and emotional appeal. This style differs significantly from that of Polybius, who instead is said to focus on the circumstantial and factual details.[2]

Finally, throughout Livy's work, one can find instances of poetic enrichment, which further distinguish his prose style and encourage textual analysis of his work. This employment of poetic style begins with the opening of the text, which is written in dactylic hexameter. Many scholars have commented on the presence of Vergilian elements throughout the text as well. The cumulative result of these nuances in Livy's style includes powerful graphic, scenic depictions, dramatism, profound speeches with conventional oratorical style, character development, and use of thematic elements throughout.[3]

Overall, the largely inconclusive body of literature pertaining to Livy's style positions Ab Urbe Condita in the spotlight for novel analysis. Motivated by a long history of analysis addressing the unique facets of Livy's diction and historiographical tone, this project explores several questions pertaining to Livy's vocabulary and sentiment throughout the first pentad, enabled by contemporary tools from computer science and data science.


Essential Questions

1. In what ways does Livy exhibit an abbreviated vocabulary in the first pentad of Ab Urbe Condita, if at all?

2. What is the sentiment of Livy's history and how does this sentiment evolve over the first pentad?

3. How does the sentiment relate to historic scenes?


Scroll through the boxes to view the two types of methodolgy employed in this work.


In order to investigate the essential questions of this project, I analyzed both Latin text and English translation. For questions pertaining to word choice, the Latin text was utilized, whereas for questions pertaining sentiment, the English translation was utilized. Before working with the Latin text, I had to perform lemmatization, removal of stopwords, and punctuation normalization, described below. Similarly, before working with the English translation, I had to employ a variety of text processing techniques in order to prepare the text for sentiment analysis, including stemming, removal of stopwords, and punctuation normalization.


Textual Sources

Latin Textual Source

I built a corpus of Livy's first pentad and Book 21, as well as selections from the remains of the first five books of Tacitus' Annales and the first and second parts of Sallust's Bellum Iugurthinum. All Latin text was sourced from The Latin Library:

Via novel lemmatization methods built strictly for the analysis of Latin text, I performed an analysis of Livy's diction.

English Translation Source

In addition to a Latin corpus, I built a corpus of companion English translations only for the first pentad and Book 21 of Livy's AUC.

All translations were adapted from Bruce J. Butterfield from the Kennedy Association Webpage.

The companion English translations enabled sentiment analysis of Livy's history over the first pentad and Book 21. Sentiment analysis was performed via the Porter Stemmer Algorithm with AFINN sentiment analysis, described in the methods section.


Textual Analysis

Classical Language Toolkit

For this project, I implemented the Classical Language Toolkit (CLTK), an open-source natural language processing support for Classical languages. Within this toolkit, I utilized the lemmatization methods. The purpose of lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For each chapter in each book analyzed from Livy's AUC, I performed lemmatization. Essentially, I ran the lemmatizer on every word of the first pentad in order to normalize the words to the nominative, singular form. This method was employed for Livy's work as well as the works of Tacitus and Sallust, for comparative analysis. The CLTK lemmatizer performs this normalization process via a key-value store. For ambiguous forms, the lemmatizer selects which word occurs most. This arbitrary selection method does not yield 100% accuracy; however, given the low percentage of ambiguous forms in the first pentad, the method sufficed for this project and is currently implemented for comparable projects in the digital classics field. Lemmatization enabled normalization of the corpus, which motivated questions around Livy's diction and enabled the creation of word clouds via the R statistical language wordcloud package.


Porter Stemmer Algorithm

The Porter Stemmer Algorithm originated from the following publication:

C.J. van Rijsbergen, S.E. Robertson and M.F. Porter, 1980. New models in probabilistic information retrieval. London: British Library. (British Library Research and Development Report, no. 5587).

The algorithm defines consonants as letters other than A, E, I, O or U, and other than Y preceded by a consonant and denotes consonants with a C and vowels with a V. The variable m represents repetition of a letter pattern, such that [VC]^m where m=2 would denote VCVC. From this, m also represents the measure of a word when in this form. It is the case that:

1. Da, m=0

2. Danger, m=1

3. Dangers, m=2

In order to remove a suffix from a word (i.e. S1 -> S2), the appropriate conditions must be satisfied for both the suffix and V/C representation before the suffix. Below are the 5 essential steps in the Porter Stemmer algorithm. Where removed stems are not indicated, the -- denotes that the entire stem is removed.



(m>0) ES -> --

(m>0) S -> --


(m>0) ATIONAL -> ATE

(m>0) TIONAL -> TION

(m>0) ENCI -> ENCE

(m>0) ANCI -> ANCE

(m>0) IZER -> IZE

(m>0) ABLI -> ABLE

(m>0) ALLI -> AL

(m>0) ENTLI -> ENT

(m>0) ELI -> E

(m>0) OUSLI -> OUS

(m>0) IZATION -> IZE

(m>0) ATION -> ATE

(m>0) ATOR -> ATE

(m>0) ALISM -> AL

(m>0) IVENESS -> IVE

(m>0) FULNESS -> FUL

(m>0) OUSNESS -> OUS

(m>0) ALITI -> AL

(m>0) IVITI -> IVE

(m>0) BILITI -> BLE


(m>0) ICATE -> IC

(m>0) ATIVE -> --

(m>0) ALIZE -> AL

(m>0) ICITI -> IC

(m>0) ICAL -> IC

(m>0) FUL -> --

(m>0) NESS -> --


(m>1) AL -> --

(m>1) ANCE -> --

(m>1) ENCE -> --

(m>1) ER -> --

(m>1) IC -> --

(m>1) ABLE -> --

(m>1) IBLE -> --

(m>1) ANT -> --

(m>1) EMENT -> --

(m>1) MENT -> --

(m>1) ENT -> --

(m>1 and (*S or *T)) ION -> --

(m>1) OU -> --

(m>1) ISM -> --

(m>1) ATE -> --

(m>1) ITI -> --

(m>1) OUS -> --

(m>1) IVE -> --

(m>1) IZE -> --


(m>1) E -> --


Data-Driven Documents (D3.js)

For all visualizations in this project, with the exception of the wordclouds and barcharts, I utilized D3.js, a JavaScript library for manipulating documents based on data (


Sentiment Analysis

For sentiment analysis, I implemented AFINN. AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive).




Textual analysis lends itself well to visualization. Thus, the culminating products of the aforementioned analysis can be observed in the visualizations below.


Livy's Word Choice: First Pentad and Book 21













A boxplot demonstrating sentiment across the first pentad (English Translation).

Hover over each box plot for a brief summary of the book.


A zoomable line chart demonstrating sentiment across each chapter of each book of the first pentad (English Translation).

Scroll in and out of the line chart to zoom in on specific books. Hover over the lines or points to discern the specific book number.





The visualizations above provide insight into the essential questions of this project.

Essential Question 1:

In order to understand whether Livy's vocabulary is significantly redundant, I performed both intratextual and comparative analyses of Livy's word choice. For the intratextual analysis, I created word clouds to observe the evolution of Livy's word choice over time. It appears that the use of words at the crux of the Latin language, such as forms of sum, relative pronouns, and specific prepositions remained fairly consistent throughout Livy's first pentad. Additionally, thematically relevant words, such as bellus and populus remained prevalent in this initial part of the text. One interesting finding is the rise of the use of urbs in the first pentad, which carried into later parts of the text, such as Book 21. Potentially, this finding reflects the growth of Rome as an imperial center and highlights Livy's evolving depcition of Roman urbanization.

For the comparative analysis, I compared Livy with Tacitus (Annales Books 1-4 and the small surviving part of Book 5) and Sallust (Bellum Iugurthinum 1 and 2). After performing lemmatization on all of these texts, I selected the top 98th perceptile of words in each of the texts in order to generate the bar charts. The top 98th percentile for words utilized in Livy's AUC contained 286 words. For Sallust, the top 98th percentile contained 89 words. Lastly, for Tacitus, the top 98th percentile contained 243 words. From this initial understanding, it appears that Livy had the greatest number of words in the top portion of the distribution. Overall, Livy used 5985 unique words, whereas Sallust and Tacitus used 3147 and 5266 unique words respectively. Interestingly, each author seemed to favor three to four of the same words. For Livy, these four words were: sum, qui, is, and ut; for Tacitus: is, qui, sum, and ut; and for Sallust: is, qui, and sum. This finding is not surprising, as these words are expected to be greatest in number, given Latin sentence structure. Nevertheless, it is interesting that Sallust utilized ut less than Livy and Tacitus, potentially indicating a difference in clausal structures.

Overall, Livy employed the greatest number of unique words, with Tacitus and Sallust following, respectively; however, the first pentad of Ab Urbe Condita was the longest text segment used in this analysis, given that Tacitus's work was comprised of the first four books of the Annales and only a small surviving portion of Book 5. Moreover, Sallust's work was the shortest out of all three selections, comprised of only 2 books in this analysis. Thus, there exists inherent bias in the text selection, given that the work with the highest number of unique words was also the longest. I was unable to develop a robust method to control for text length in this analysis. Still, from this work, it is apparent that Livy does not fall significantly behind his peers, as the lengths of Tacitus's Annales and Livy's AUC differed by roughly one book, but exhibited similar unique word counts. It was also evident that as the length of texts increased, the number of unique words increased. Potentially, this is related the lemmatization process. The lemmatization method utilized in this project was not robust enough to determine accuracy of lemmatization, so it is unclear if the lemmatizer contributed to this increasing trend, or if authors actually diversify their diction increasingly as they write.

Essential Questions 2 and 3:

Overall, Livy's sentiment changes radically throughout each book in the first pentad. Sentiment generally stayed in the range of -30 to 30, but a few sentiment breakouts occurred. While no specific conclusions can be drawn from these sporadic trends, the data supports the body of literature that has established Livy as a dramatic historiographer who evokes emotion with his style. There are a couple interesting sentiment breakouts to note:

Book 1, Chapter 34: Tarquinius Priscus comes to Rome and becomes very popular

This book had a high sentiment of 40. A high sentiment indicates frequent use of positively affiliated words, thus suggesting positive sentiment. In this chapter, Livy describes the rise of Tarquinius Priscus and details the journey of Tarquinius Priscus and his wife, Tanaquil, to Rome. Tanaquil is described as wealthy and high class, while Priscus is depicted as ambitious and eager to gain power. In this chapter, Livy also explains the omen of Priscus's cap, which is plucked from his head by an eagle, yet quickly returned. People thought that this omen signified Priscus' future kingship at Rome. The data indicates that these depictions and events are implemented by Livy to evoke hopeful and excited feelings from the reader, supported by the high sentiment obtained for this chapter.

Book 5, Chapter 39: The Gauls, recovering frmo their astonishment at so easy a victory, reach the walls of Rome before nightfall. The Romans at any moment expect their attack. They take measures to protect their most sacred posessions and to defend the Capitol. (Livy Book V, R.I. Ross)

This book had a very low sentiment of -37. A low sentiment indicates frequent use of negatively affiliated words, thus suggesting negative sentiment. In this chapter, after being devastated by the Gauls, much of Rome flees to Veii and the remaining individuals withdraw to the Capitoline Hill. At this point in history, the city is completely destroyed and only grief and terror remain. Based on the sentiment analysis performed for thie project, it appears that Livy fully capitalizes on the literary opportunity to evoke the pain of one of Rome's most tragic defeats, and does so convincingly.

There exist several additional instances of similar correlations between the underlying history and the sentiment calculated for a given chapter. These results suggest that the English translation of Livy's work employed in this analysis likely corresponds well to Livy's underlying dramatization in his history. Still, because English translations were utilized in this anlaysis, it is unclear how Livy's actual Latin text would relate to the content of each chapter; however, I perceive that the relationship would be even stronger. An area of future research should develop parallel sentiment analysis capabilities from AFINN for Latin words.


1. Canter, H.V. "Livy the Orator." The Classical Journal 9.1 (1913): 24-34. Web.

2. Ebeling, H. L. "Livy and Polybius: Their Style and Methods of Historical Composition." The Classical Weekly 1.4 (1907): 26. Web.

3. Mcdonald, A. H. "The Style of Livy." Journal of Roman Studies J. Rom. Stud. 47.1-2 (1957): 155-72. Web.

4. Moerk, Ernst L. "Quantitative Analysis of Writing Styles." J. Ling. Journal of Linguistics 6.02 (1970): 223. Web.

5. Steele, R.B. "Livy." The Sewanee Review 15.4 (1907): 429-47. Web.