This is a fascinating but ultimately futile use of multi-variate analysis to search for common characteristics in best sellers. The authors are aware of the limitations of their method, and point them out, but they still do fall into them.
Essentially, a multi-variate text mining algorithm looks through thousands of books and extracts features which are characteristic of best selling books, as well as features which tend not to characterise them. The authors do not claim this predicts good books, just popular ones. On some books it does well: against all reason and good sense, Fifty Shades of Grey and The Da Vinci Code were among the very big best-sellers, and this book shows what features they have in common with other best sellers, in terms of story arcs, thematic balance and language, which are not usually shared by books in their genre. On the other hand, Harry Potter should not be a best-seller based on their model (as they admit), nor should the works of Tolkien or anything else which is science-fiction or fantasy related.
If you look at the top ten bestsellers of all time (using the Wikipedia list, for example, but it isn't that different from other lists I've seen), the books which have made it really big tend not to be the kind of books this model recommends, featuring fantasy (Lord of the Rings, Harry Potter, Hobbit, Lion, Witch & Wardrobe, She, The Little Prince), non-domestic situations (And Then There Were None, Tale of Two Cities) and non-topicality.
The problem with multi-variate analysis—as the authors admit—is that you tend to find what you are looking for. What this book doesn't do, and, again, the authors admit this, is find the causal link which produces bestsellers. In reality, it is as likely to be an artefact of the book acquisitions process. Editors and agents look for certain features, and they only allow books with those features to get through the sieve. Harry Potter got through by the skin of its teeth on the back of extraordinary persistence by its author, which would tend to explain why it is an outlier. Once through the sieve (and a lot of what this book detects is that sieve), it's natural that books which lack particular flaws will do better than the more flawed books.
If you're interested in computational linguistics, this is a nice book to browse through. If you want the secret to your bestseller, it isn't here.
Tidak ada komentar:
Posting Komentar