Thursday, May 18, 2017

Author Attribution Analysis of the Brontë novels

Once in awhile, we discover truth is stranger than fiction. Strange as it may seem, I contend, and science now supports my view, that Charlotte Brontë is the sole author of all the Brontë novels, which include those believed to be written by her sisters Emily and Anne. My findings are explored in detail in Charlotte Brontë's ThunderNames matter, and Charlotte’s pen names enabled her to execute her literary deception.

In December 1847, when Wuthering Heights debuted on the London bookshelves, the author Ellis Bell appeared to be a relative, a brother perhaps, of Acton Bell whose novel Agnes Grey arrived in the shops at the same time. These two books had followed the publication of Jane Eyre: an Autobiography edited by Currer Bell. The public, curious at the sudden glut of Bells, wondered if the three were brothers or perhaps sisters, and even considered Acton and Currer ladies, and Ellis a man. They agreed no lady could have written Wuthering Heights because of its violent and harsh depictions. A suspicious reviewer, based on the novels’ similarity in sensibility, suggested one author wrote all three books; she found the books contain ‘singularly unattractive’ protagonists, and the writing presents a coarseness and brutality that combines ‘genuine power with such horrid taste.’[1]

The three Bell brothers were later unmasked to be sisters and, after their deaths, the public discovered that the sisters were Charlotte, Emily, and Anne Brontë.

Prior to their deaths, a highly respected Victorian literary critic Sidney Dobell deduced that Currer (Charlotte) wrote Wuthering Heights, Agnes Grey, and a later Acton Bell book The Tenant of Wildfell Hall. He had found similarities between them and Jane Eyre. Interestingly, in the second edition of Wuthering Heights issued after Emily and Anne had died, Charlotte as Currer Bell states in her ‘Biographical Notice of Ellis and Acton Bell’ that only one critic—Sydney Dobell— ‘discerned the real nature of Wuthering Heights.’[2] Was this an allusion to authorship? If ‘nature’ is defined as the identity as much as the essential character behind the work, perhaps she was providing a hint.

In the late 1840s, the editor who published Ellis Bell’s Wuthering Heights, and Acton Bell’s Agnes Grey and The Tenant of Wildfell Hall had also read a fourth Bell manuscript, one penned by Currer Bell and entitled The Professor. He rejected it for its lack of excitement, but was privy to important information: he had viewed all three Bell brothers’ works and, therefore, had seen the handwriting on their manuscripts. He sent The Tenant of Wildfell Hall to New York and told the publisher ‘he was about to publish the next book by the author of Jane Eyre, under her other nom de plume of Acton Bell—Currer, Ellis, and Acton Bell being, in fact, according to him, one person.’[3]

If Currer Bell was using pseudonyms, this revelation posed a problem for the author.

I believe Charlotte Brontë denied the one-author claim for pragmatic reasons: her Currer Bell novels were with a different publisher, and she understood the legal complications of having two publishers. As well, as a safeguard, if she were to predecease her sisters, they could continue to receive royalties. Her denials seemed to end the controversy but, if she is the sole author, was there another reason for Currer Bell to use one or even two pseudonyms?

Virginia Woolf notes in A Room of One’s Own that women, like Currer Bell, needed to use pen names. The women were ‘victims of inner strife as their writings prove.’ They ‘sought ineffectively to veil themselves by using the name of a man’ because ‘[a]nonymity runs in their blood.’ But why use a male pen name? Woolf states that men considered women who sought ambition and fame to be ‘detestable,’[4] so Currer Bell would have preferred to protect her identity from the harsh judgment of a biased public. Or, she may have had the additional motive of copyright discrepancies as suggested above. If Sidney Dobell was correct and Currer Bell is the sole author of the Bell novels, how could one prove that Charlotte Brontë wrote Wuthering Heights, Agnes Grey, and The Tenant of Wildfell Hall?
One possible means of clarification would be the discovery of these long lost manuscripts.  If the pages were in Charlotte’s handwriting, as the early publisher stated, then readers would acknowledge her as the true author.  Unfortunately, the only manuscripts missing from the Brontë oeuvre, coincidentally, happen to be those penned by Ellis and Acton Bell. Without the manuscripts, however, scholars could use the science of stylometry to determine authorship. Author attribution or stylometry uses computer software to analyze texts, such as literary works, to detect the distinct linguistic signature in their personal writing style.

In 2013, a confidential informant tipped off a journalist that the author of the Harry Potter books, J.K. Rowling is also the author of The Cuckoo’s Calling, a novel she had written under the pseudonym of Robert Galbraith. London’s Sunday Times asked a linguistics professor, Patrick Juola, to confirm the journalist’s suspicions. Juola has designed a computer program called the Java Graphical Authorship Attribution Program or JGAAP to recognize common but subtle writing patterns that are undetectable to readers. These include language tools such as common words, word lengths, and pairs of adjacent words. The program compared The Cuckoo’s Calling to Rowling’s The Casual Vacancy as well as three sample texts by three fiction authors: Ruth Rendell, P.D. James, and Val McDermid. The results pointed to J.K. Rowling as the author and she immediately confessed that Robert Galbraith is a pseudonym.

Juola stated that author attribution, through his program, allows a curious amateur to gain results of novel analysis in a short time. One such amateur, writing in The New Yorker, was author Paul Collins who used JGAAP to discover, after inputting several samples, that three pieces of prose, previously credited to Edgar Allan Poe’s brother Henry seemed to be Edgar’s creations.[5]  Juola noted that, ‘in the event that we were studying a long-dead author, this is the kind of thing that could and would be argued about in the journals for decades’[6] because the author is no longer available for comment.

Juola’s analysis of documents ‘has been recognized by the Plagiarism Action Network as one of the most accurate methods of authenticating authorship.’[7]  His software helps in his detecting: ‘What we are doing is the same type of judgment that experts have always done about reading documents and figuring out something about the author—just a lot faster, and more accurate than most.’[8] The software tool may not be able to proclaim certainty, but can provide statistically valid evidence that suggests a particular author could have written the tested prose. When analyzed in conjunction with other evidence, however, the results could produce a higher degree of probability.

For the past several years I have been researching the Brontë novels in terms of authorship. Several clues had pointed me in this direction, specifically identical symbolic patterns, themes, riddles, and code in Jane Eyre and Wuthering Heights. Also, to the naked eye, these two novels share recurring syntax and diction with The Professor, Agnes Grey, and The Tenant of Wildfell Hall. One could argue these similarities are the result of the three sisters living under the same roof, but when placed beside other evidence, the probability that all three sisters wrote in such a similar manner decreases. A scientific analysis of the novels, therefore, could either quash or support serious consideration of my theory.

The evidence that strengthened my belief that Charlotte wrote Wuthering Heights and the Anne Brontë novels came from the only handwritten examples of Emily and Anne’s prose. Emily and Anne’s Diary Papers and Birthday Papers provide a sample of the quality of their writing. The two sisters wrote briefly about their lives on four separate occasions in 1834, 1837, 1841, and 1845. The prose, especially Emily’s, has challenged scholars to explain the disparity between this writing and her brilliant novel Wuthering Heights. One biographer states, ‘the dreadful handwriting and spelling are scarcely credible as the work of a highly intelligent sixteen-year-old.’[9]

Margaret Drabble, the novelist, biographer, and critic acknowledges that the authorship of Wuthering Heights is still an unsolved riddle. Part of the problem comes from ‘how little we know of Emily,’ and that her diary papers ‘do not reveal her as a novelist.’ She adds, ‘There is something awkward and freakish about a girl of twenty-seven playing nursery games. The absence of the awkward in Wuthering Heights is stunning,’ so how did Emily ‘write a solid, elegant, original, beautifully constructed and firmly Yorkshire novel like Wuthering Heights,’ especially when her life was known to be ‘outwardly uneventful’? Unfortunately, the ‘dearth of information’ on Emily enables the book’s authorship to ‘remain a mighty enigma,’ and elicits efforts to solve the riddle of who really wrote this ‘work of genius.’[10]

In the existing papers, when Emily was almost nineteen, she misspelled Charlotte’s name twice as Charolotte and Charollote.  In the following excerpt, with errors intact, she is twenty-three: ‘It is Friday evening—near nine o’clock—wild rainy weather. I am seated in the dining room ‘alone’—having just concluded tidying our desk-boxes—writing this document—Papa is in the parlour. Aunt up stairs in her room—She has been reading Blackwood’s Magazine to papa—Victoria and Adelaide are ensconced in the peat-house—Keeper is in the Kitchen—Nero in his cage—We are all stout and hearty as I hope is the case with Charlotte, Branwell, and Anne, of whom the first is at John White Esq- upperwood House, Rawden  The second is at Luddenden foot and the third is I beleive at—Scarborough—enditing perhaps a paper corresponding to this.’

In July of 1845, very near the time she would have been writing Wuthering Heights, she writes, ‘. . . We are all now at home and likely to be there some time—Branwell went to Liverpool on ‘Tuesday’ to stay a week. Tabby has just been teasing me to turn as formerly to— ‘pilloputate.’ [peel a potato] Anne and I should have picked the black currants if it had been fine and sunshiny. I must hurry off now to my turning and ironing I have plenty of work on hands and writing and am altogether full of buisness with best wishes for the whole House till 1848 July 30th and as much longer as may be I conclude E.J. Brontë.’[11]

Scholars believe Emily is referring to the writing of the Gondal saga, one of the ‘nursery games’ that she and Anne had been involved with since childhood, but without prose examples, one can only deduce the narrative.

In Anne’s 1845 paper, she writes, ‘. . . This is a dismal cloudy wet evening  we have had so far a very cold wet summer—Charlotte has lately been to Hathersage in Derbyshire on a visit of three weeks to Ellen Nussy—she is now sitting sewing in the Dining Room  Emily is ironing upstairs  I am sitting in the Dining Room in the Rocking chair before the fire with my feet on the fender  Papa is in the parlour  Tabby and Martha are I think in the Kitchen  Keeper and Flossy are I do not know where  little Dick is hopping in his cage.’[12]
The two sisters show neither exceptional literary prowess nor inventiveness, and their prose is uninspiring.

Their letter writing is also sparse. Charlotte wrote letters that would fill three volumes, but only four of Anne’s letters have survived and Emily’s comprise just over three hundred words. Consequently, when I downloaded Professor Juola’s software program on stylometry, I had a limited number of words with which to compare Emily and Anne’s prose with Wuthering Heights, Agnes Grey, and The Tenant of Wildfell Hall.

I studied the stylometry guide, and followed Prof. Juola’s instructions to Paul Collins as how best to program JGAAP. I used a sample 5,000-word text from Wuthering Heights and compared it to a 2500-word sample of Charlotte’s novel The Professor, written in 1846 at the time scholars believe Emily wrote Wuthering Heights. Emily and Anne’s diaries are not ideal samples at 1700 and 1100 words respectively, but at least they are known exemplars of their writing.

I used all of Emily’s diary/birthday papers and Anne’s 1841 and 1845 papers. I then included three contemporary authors as detractors: 5,000 word samples from Jane Austen’s Northanger Abbey; Elizabeth Gaskell’s Cranford, and Harriet Martineau’s Deerbrook. I selected words, word stems, and n-grams for parts of speech, characters, and words, and used a Most Common Events culler, Centroid driver function, and Cosine, Histogram, and Manhattan functions. I asked the program to process the information and waited for the results of the fifteen outcomes.

I repeated the exact criteria for Agnes Grey and The Tenant of Wildfell Hall.

Juola explained that in order to establish the most likely author, I must look for the number of times Charlotte’s name appears in the top three rankings. The ideal situation is that her name appears first all the time, but the professor added that if her name comes out as first or second ‘almost every time,’ it’s ‘highly likely’ Charlotte is the author.

In the case of Wuthering Heights, Charlotte’s name came up first in 15/15 categories; Harriet Martineau came second 8/15 times, Emily’s name came second 4/15 times, and Jane Austen’s 3/15 times. Apparently, based on science, Charlotte’s prose is an identical match to the prose in Wuthering Heights. Understandably, Emily’s prose fell short.

When I analyzed the Anne Brontë novels, the results were equally supportive of my hypothesis. For Agnes Grey, Charlotte took top spot 14/15 times, while Jane Austen came first 1/15. Anne’s name occurred twice in 3rd place. The Tenant of Wildfell Hall gave Jane Austen the prime spot 10/15 times, with Charlotte coming in second 10/15 times in those categories, and first 5/15 times in the remaining ones. Anne’s name was last 9/15 times and 5th 6/15 times. Jane Austen died in 1817, so she obviously was not the author of The Tenant of Wildfell Hall, but a brief manual analysis of Northanger Abbey and the Brontë novels showed a number of syntactical similarities in over 50 examples. This could mean either Charlotte borrowed from Austen’s style or Austen’s style was simply similar to Brontë’s at that time.

To test if the program would recognize a Charlotte Brontë text, I programmed Jane Eyre as the unknown document and kept the detractors and the Brontë samples the same. Her name came out as the author 15/15 times.

Is this proof positive that Charlotte Bronte wrote all the Bronte novels?

Juola has stated that ‘modern computational linguistic technologies have produced a faster, more objective, and more scientific method of answering such questions on the basis of document statistics.  If some notion of writing style can be objectively measured, it becomes possible to quantify exactly the idea that this document is likely to have come from this person. Just as every person has their own fingerprint or DNA, so every person has their own writing style, an extension of their personality and cognitive habits.’[13] He would never claim that his program is as certain as DNA, but that a set of markers can raise the level of probability. 

The results suggest Charlotte Brontë wrote the novels; the results are indicative of her being the author, but without her comments, as in the case of J.K Rowling, we can only speculate on the possibility. At the very least, the results of these Author Attribution tests could ignite spirited debate over authorship. Perhaps my suggestion that Charlotte is the sole genius in the Brontë family contains the elements less of fiction and more of truth.


[1] Elizabeth Rigby, ‘Vanity Fair—and Jane Eyre.’ Quarterly Review: 84:167, (December 1848): 153-185. Accessed October 2015.

[2] Emily Brontë, Wuthering Heights. (London: Penguin Classics Edition, 2003) xlvi.
[3] Smith, George. The Recollections of a Long and Busy Life. (1895); typescript in the National Library of Scotland, MSS 23191-2.
[4] Virginia Woolf, A Room of One’s Own. (London: Granada, 1978) 49.
[5] Paul Collins, “Poe’s Debut, Hidden in Plain Sight?” <>. Accessed October 2015.
[6] Patrick Juola, “Rowling and “Galbraith”: an authorial analysis.” . Accessed October 2015.
[7] Anya Sostek, Pittsburgh Post-Gazette. “Duquesne professor helps ID Rowling as author of ‘The Cuckoo’s Calling.’<>. Accessed October 2015.
[8] Steve Kolowich, “The Professor Who Declared, It’s J.K. Rowling.” July 29, 2013. <> Accessed October 2015.

[9] Juliet Barker, The Brontës. (New York: St. Martin’s, 1994) 221.
[10] Emily Brontë, Wuthering Heights. Introduction by Margaret Drabble. (London: Everyman Ltd., 1978) ix-xx.
[11] Emily Brontë’s Letters and Diary Papers. <> Accessed October 2015.
[12] Anne Brontë’s Birthday Paper July 30, 1841 and Diary Paper July 31, 1845. <> Accessed October 2015.
[13] Patrick Juola, “Computational Analysis of Authorship and Identity for Immigration.” <> Accessed October 2015.