Showing posts with label the crowd. Show all posts
Showing posts with label the crowd. Show all posts

Monday, December 05, 2011

Wikipedia and the Enlightment

A debate is raging in Wikipedia about why their 'valued articles' are not very good, and hence not very valuable. You can follow it on their mailing list here.  The list of 'Valued Articles', which is itself not very good, because it omits many obviously vital articles (such as Theology) is here.  The obvious answer (which seems to have occurred to no one) is that there is a shortage of editors who know about these subjects. Take one of the vital articles on their list, Age of Enlightment.  It begins
The Age of Enlightenment (or simply the Enlightenment or Age of Reason) was an elite cultural movement of intellectuals in 18th century Europe that sought to mobilize the power of reason in order to reform society and advance knowledge. It promoted intellectual interchange and opposed intolerance and abuses in church and state. Originating about 1650–1700, it was sparked by philosophers Baruch Spinoza (1632–1677), John Locke (1632–1704), Pierre Bayle (1647–1706), mathematician Isaac Newton (1643–1727) and Voltaire (1694–1778). Ruling princes often endorsed and fostered figures and even attempted to apply their ideas of government. The Enlightenment flourished until about 1790–1800, after which the emphasis on reason gave way to Romanticism's emphasis on emotion and a Counter-Enlightenment gained force.
This is horrible and clumsy and fails to explain what the Age of Enlightenment was really about. First of all it was a period rather than a movement, which is in fact why it is called the 'age' of enlightenment, lasting from about 1740 to 1780, although its ideals persisted for long after that, and I would like to think or hope that this blog embodies some of them.

Secondly, the Enlightenment is essentially a set of values shared by prominent writers and thinkers of the period. Their guiding principle was that the increase of knowledge, the use of reason, and the application of the scientific method would improve the condition of humankind. Its outlook was a belief in the possibility of progress: human beings are essentially good, and people can better themselves and society by education and the application of reason.

The introduction to the Wikipedia article doesn't really get us there at all. I'm not sure that enlightenment thinkers regarded themselves as an 'elite'. One of their basic principles was that reason is the property of all humankind, and not of some self-elected elite. To say the movement 'promoted intellectual interchange' may well be true, but is true of many other movements. The third sentence about princes "endorsing and fostering figures" and "applying their ideas of government" makes no sense. And the final sentence is horribly 1066-ish.

The problem is simple: the theory of "crowdsourcing" doesn't work. According to this theory, 'anyone can edit' an article, and by some Darwinian process the good edits will survive and the bad ones will perish and become extinct, and so we will end up, after 10 years of Wikipedia, with an absolutely perfect and flawless article about the Age of Enlightenment.  That theory is clearly false. It is a crime that the gang of illiterates who have taken over Wikipedia should have let such a noble project - one which should have been the very embodiment of the ideals of the Enlightenment - languish and decay so lamentably.

Sunday, November 13, 2011

Richards on popular culture

Researching attitudes to pop culture before the 1960s I came across this comment by I.A. Richards*
With the increase of population the problem presented by the gulf between what is preferred by the majority and what is accepted as excellent by the most qualified opinion has become infinitely more serious and appears likely to become threatening in the near future. For many reasons standards are much more in need of defence than they used to be.  It is perhaps premature to envisage a collapse of values, a transvaluation by which popular taste replaces trained discrimination. Yet commercialism has done stranger things: we have not yet fathomed the more sinister potentialities of the cinema and the loud-speaker, and there is some evidence uncertain and slight no doubt, that such things as 'best sellers' (compare Tarzan with She), magazine verses, mantelpiece pottery, Academy picture, Music Hall songs, County Council buildings, War Memorials ... are decreasing in merit.  Notable exceptions, in which the multitude are better advised than the experts, of course occur sometimes, but not often.
Note that the Wikipedia article on Richards provides more evidence for my theory that most of Wikipedia was written by 2007, and that it was written my a small number of people more in the manner of a conventional encyclopedia than by 'crowdsourcing'. The current article differs little from the version of July 2005 – subsequent changes are mere alterations to format, linking and 'wikifying', and it was entirely written someone editing from this IP address.

*Principles of Literary Criticism, 1924, republished by Routledge Classics 2001, p. 31.

Sunday, October 30, 2011

Repetitive labour and Wikipedia

In an effort to understand the different ways in which 'editors' contribute to Wikipedia, I have been using this tool to survey the average number of edits per page of the current 726 active administrators on the English Wikipedia.  I completed this rather tedious task this morning.

The result is that there is a wide range of edits per page from a low value of 1.21 at one extreme, to a high value of 20.51 at the other.  The value distribution, as a percentage of the sample, is shown in the table at the bottom.

What does this mean?  Clearly an editor with a very low edit/page count will be spending very little time on individual articles. The limiting value is 1, meaning that an editor  never returns to a page once they have done something to it.  The upper value is limited by the maximum number of edits to any single article, and will occur if one editor entirely wrote that article, with no help whatsoever.

What else can we say?  My main question is the different value contributed by editors with radically different edits per page.  Are the contributions of those with a high count, of a higher or lower value than of those with a low count?  Before you leap to conclusions, consider the following thought experiment.  Suppose that the article on Caspar David Friedrich, which is not a bad article, and is indeed a Wikipedia 'Featured Article', had been written by about 1,700 different editors.  Thus (since there have been about 1,700 edits to this article) each editor would have contributed no more than one edit.  The article would have grown to its present good quality entirely from the separate and probably disconnected contributions of the different editors.  And then extend the thought-experiment by supposing that all the Featured Articles - which are supposed to be the very best quality that Wikipedia has to offer - were written in this way.

As a limiting case, suppose there are 1,000 Featured Articles, and only 1,000 editors working on them, that each has 1,000 edits, and each editor has edited each article exactly once.  It is theoretically possible that all Featured Articles grew to their currently 'good' state by such a process.  In that case, edits per page would not be a good metric to determine whether the editor was what Wikipedians call a 'content contributor'.  All editors would be 'content contributors', but they would distribute their content thinly and evenly across many different articles.  This would be the 'classic crowdsourcing' that I discussed earlier articles such as this.

But this is clearly not the case.  More research is needed, but there are several bits of evidence suggesting that when 'value' or 'content' means the sort of quality assessed by the Wikipedia 'Featured' or 'Good' article assessment, it is editors with a relatively high edit per page who contribute this.  For example, look at the page here which tells us who contributed to the Caspar David article.  Three editors stand out, namely Ceoil (8.43 edits per page), Modernist (8.07) and Fpenteado (9.29).  Not only did these editors contribute significantly to this article, they contributed significantly to many other articles on Wikipedia.

Another piece of evidence is the type of contribution made by those with low edits per page.  For example, the lowest edit per page of my sample was Andre.  If you look carefully at what he is doing, he is simply adding links to articles on the Estonian Wikipedia, something which he seems to have been doing for a very long time.  That doesn't mean he is not adding something of value to Wikipedia, but you clearly couldn't build an article like the one on Caspar David simply by adding links to the Estonian Wikipedia. Clearly not.  Or consider the contribution history of 'Gaius Cornelius'.  He is using what is called a 'bot' on Wikipedia, i.e. a robot or mechanised editing tool. As you see from its description here, it is a tool 'designed to make tedious and repetitive tasks quicker and easier'.  This is mainly formatting and linking to other articles.  Again, this doesn't mean he and his robot are not adding some sort of value to Wikipedia, but it's clearly not the sort of value that could build an article like the one on Caspar David.

Now we could go further and bite that very difficult bullet: what is the economic value of the different contributions?  That is, what would be the market value of the labour corresponding to the different edits per page?  There are a number considerations here, and please note I am not an economist.  The first is that if quality of articles was a prime consideration, where 'quality' is measured by the Featured Article process, and where quality is the prime objective of the project, you would want to attract more 'content contributors' to increase quality.  Second, given that the table above suggests that content contributors are scarcer than mechanical contributors, you would want to pay more to the content contributors.  Finally, the principle that repetitive labour is easily learned, and thus less well paid than labour whose skill is difficult to acquire, would suggest paying the content contributors more, perhaps much more.  Which is the case in conventional encyclopedias, of course, where the bulk of the work is done by poorly paid penny-a-liners, often using custom-built databases such as Crystal, and the remaining 'flagship articles' are commissioned to skilled subject-matter experts for a premium fee.

This begs the question of why content contributors exist on Wikipedia at all, but that's a subject for another discussion, and I have rambled on enough for today.

By the way, Beyond Necessity is approaching a record number of page views this month.  3,848 views to today, compared to 3,490 last month, and looking to hit the 4,000 barrier by the end of this month. So, please feel freer than usual to click on some of the internal links here.  With best wishes to all.


Edits per pagePercentage of sample
1-222%
2-344%
3-418%
4-58%
5-64%
6-72%
7-81%
greater than 91%

Wednesday, October 05, 2011

Crowdsourcing philosophy

I am starting work on the Wikipedia book.  One of the central themes will be the ideology of crowds: are they mad, or are they wise?  Probably a little bit of both.  Wikipedia is good at handling matters of detail. But as I have said before, if Wikipedia tried to write the decline and fall of the Roman empire, which requires assembling the right 'little facts' in the right order, and placing a narrative around these, the result would be very bad.  I periodically return to the philosophy article itself, looking for evidence of progress.  Here it is at the end of Wikipedia's first year of existence.
The definition of philosophy is a philosophical question in its own right. But for purposes of introducing the concept, we can say that, approximately, it is the study of the meaning and justification of beliefs about the most general, or universal, aspects of things--a study which is carried out not by experimentation or careful observation, but instead typically by formulating problems carefully, offering solutions to them, giving arguments for the solutions, and engaging in dialectic about all of the above. Philosophy studies such concepts as existence, goodness, knowledge, and beauty. It asks questions such as "What is goodness, in general?" and "Is knowledge even possible?" Some famous philosophers include Plato, Aristotle, Rene Descartes, John Locke, and Immanuel Kant.
The article lacks any of the formatting that Wikipedia developed later, and there are no pictures, and it is short. But the definition is as good as you are likely to get for such an abstruse and difficult subject. As it points out " the question "What is philosophy?" is itself, famously, a vexing philosophical question.  That was probably the high point of the article, more than ten years ago.  It has had some spectacular low points, in particular here, when two rather deranged editors took control of the article ("As a consequence of the collapse of colonialism and imperialism in the twentieth century, philosophy now is classified according to three major geographical regions, Western philosophy, Eastern philosophy, and African philosophy").  The worst degradation is prevented largely because of two academically trained editors who try to take care of it. However it seems to be reaching a low point again.  Someone has arranged the article around geographical headings, which makes no sense.  As one of the better editors remarks on the talk page.
I notice some editor(s) have hamhandedly integrated the history sections with the previous "Geographical" sections of the article. Since the geographical sections were very poorly written (i.e., terribly sourced, tendentiously written, riddled with dubious claims, huge WP:UNDUE problems), this has the net effect of seriously degrading the quality of a half-decent section of the article. Can we revert to the prior organization, or substantially rewrite the entire section to repair these huge problems? To put it simply: if you open almost any reference book on philosophy, or encyclopedia article on philosophy, you will see in the corresponding "history" section a far, far better treatment than the eyesore this article is currently burdened with. And such treatments will be substantially closer to the previous "history of western philosophy" section than the current revision. 271828182 (talk) 00:13, 16 September 2011 (UTC)
Take a look at a previous version, ... compare it with the current version, which is barely coherent. Or, as I suggested, compare it to virtually any "history" section of a competent encyclopedia article or reference source on philosophy. The "non-western" sections have always been rubbish, and this just embeds the rubbish front and center. 271828182 (talk) 22:29, 16 September 2011 (UTC) 
Quite.  It makes little sense to organise philosophy geographically.  It is a single subject with a single tradition that begins with the Greeks, passes to the Romans, and to the Western medieval philosophers by way of North Africa, Persia, Moorish Spain and many other places.  The geography and history are interesting, but incidental to the subject matter. As Larry Sanger (who wrote the 2001 version referenced above) wrote in 2004
One has only to compare the excellent Stanford Encyclopedia of Philosophy or The Internet Encyclopedia of Philosophy to Wikipedia's Philosophy section. From the point of view of a specialist, let's just say that Wikipedia needs a lot of work. (Wikipedia Must Jettison Its Anti-Elitism by Larry Sanger  Kuro5hin, Fri Dec 31, 2004)

Thursday, September 22, 2011

Collective wisdom

There was a huge burst of traffic yesterday from Crooked Timber discussing the wisdom of crowds.  Someone linked to my Wikipedia posts here, so I return the favour.  As well as the posts on this blog, there is an article I wrote for the Skeptical Adversaria, of which a copy is on the web here. I have argued many times that crowdsourcing can work well for items of 'hard' knowledge - easily verifiable facts of the sort you would find in an almanac, scientific constants, domains subject to clear proof, such as mathematics.  For the humanities, and in general for any abstract subject that requires thoughtful summarisation, it is a disaster.  Enough said.

Wednesday, September 14, 2011

What’s up at the Logic Museum

The Logic Museum is now a wiki, although a closed one, meaning not everyone can edit.

It’s still in the experimental stage. It uses Semantic Mediawiki which means pages can be tagged and sorted in the database. This page shows the kinds of queries that can be run. And it includes a text editor that deals with tables better than a standard wiki – parallel non-English vs English texts are a key feature

The principles of the project are set out here, but essentially it is all about bringing key texts to a wider audience. In two ways.

(1) Specialists in medieval philosophy recognise the difficulty of obtaining sources even in Latin editions. Critical edition projects like Bonaventura and Vatican have a limited print run, and not all libraries purchase these. I have access to the finest libraries in London, including the Warburg, which specialises in medieval and renaissance texts, and the Heythrop, which has a separate theology and philosophy library. Even these are missing some of the texts I would like to read, including Mazzarella’s edition of Simon of Faversham, and Scotus’ Quodlibeta. (The British Museum would certainly have copies, but I have so far avoided this institution as a result of previous experience). So a project that brings Latin texts to the Internet would be useful even to specialists.

(2) The second way involves translating these texts into English thus bringing them to a much wider audience.

The technical problems of the Logic Museum are now pretty much solved. The problem of getting it to work as a collaborative project are only starting. Wikipedia proved that crowdsourcing worked to a certain extent – although many of my posts here have been critical of the project, I still strongly believe it achieved something worthwhile and important. However, Wikipedia relies mostly on unskilled volunteers. By contrast, apart from document scanning, most of the skills involved in putting the Logic Museum together involve some sort of specialist skill. There is still no digitiser that understands Latin spelling and grammar. Thus a typical raw output looks like this. Correcting these texts means human spell-checking. Translating the texts into English requires a higher level of expertise. It’s not the grammar which is difficult. Rather, philosophical Latin employs a number of technical terms which are unintelligible even to a specialist in classical Latin. E.g. ‘dicuntur de quolibet’, which means nothing to someone brought up on the Latin of Cicero and Vergil.

Of course there is a large pool of specialist expertise in philosophy and theology departments across the world. But here you have the problem that crowdsourcing is a volunteer activity, whereas academic specialists depend for their career on publications in recognised sources. Actually they volunteer for that also – no one is paid for their contributions to journals, or for published books. The key is ‘recognised source’. Until someone can put on their CV that they have had a Logic Museum translation accepted, it is unlikely that the project will attract much interest from specialists. There is no reason in principle why this should not happen – think of the Logic Museum as potentially a sort of publishing house which has a review and acceptance process identifying which individual made which important contribution to the project.  But setting this up in the right way requires more thought, and more work.

Tuesday, September 06, 2011

Who writes Wikipedia?

I hacked together a tool to determine the size and date of each past version of any Wikipedia article, and to chart size against date to determine the growth of the article through time. Then I looked at a sample of articles from the Time ‘100 best English language novels’to determine how these grew through time. The study would need to be formalised and extended if continued to publication, but the initial results are surprising.

I wanted to test the idea discussed here. The official ‘crowdsourcing’ doctrine of Wikipedia is that editors are easily replaceable units of work, each of whose contributions are equally valuable. Supposedly, large numbers of small edits will, over time, make an article 'drift' towards quality and accuracy, even if each individual edit only improves the article imperceptibly. This philosophy has determined the way Wikipedia is administered. Those who perform purely administrative work – categorising, formatting and (mostly) vandal fighting – are rewarded by promotion within the hierarchy. Those who produce content, by contrast, receive no formal recognition, on the crowdsourcing assumption that no one person can be identified with any single article, and that content producers are replaceable anyway.

My study flatly contradicts this official doctrine. Growth in a genuinely crowdsourced article would look like Brownian motion with upward drift, as thousands of minor edits gradually ‘stick’ in a Darwinian competition for survival. This is by no means the case.  In the majority of articles sampled, there is a pronounced ‘staircase’ appearance to the growth of the article. The size increases rapidly, often within a day and a handful of edits. Then it flattens as the changes stabilize, with minor growth for months of years. Then another editor (or often the same one) adds more content and the size grows rapidly, to be followed by another flat period and so on. It is not unusual for an article that has had thousands of edits to have reach its current size through only a handful of real edits. The majority of the other edits are vandalism followed by reversion of vandalism, or minor formatting changes or adding of categories. Many articles have effectively only one editor.

Another observation is that most of the growth occurred in the period from 2004 until 2007-8. What explains this? It is well known that the overall number of Wikipedia editors has been decreasing since then. One theory is that Wikipedia is ‘full up’. Most of the ‘useful knowledge’ has already been captured. So is it that each of the articles about the ‘100 greatest novels’ reached its optimum or ideal length in 2007, and no further work is needed? No. Most of the articles in this series are short – about 10k bytes. But some are longer, and a handful are as large as 80k (which is the longest length an article should be, for practical purposes). So most of the articles are well below the length they could be: Wikipedia articles on great novels are not ‘full up’.

Then could it be that articles about novels have an optimum size, determined by their notability? Well, no. One of the longer articles (60k) is about Hemingway’s classic The Sun Also Rises. This is indisputably a great work, possibly Hemingway’s greatest work. But is it any more notable than the article on Great Expectations, which weighs in at a mere 40k? Or Pride and Prejudice, universally acknowledged to be one of the great classics of English literature (a paltry 36k)? Of course not. The article on Hemingway’s book was written by a single Wikipedia editor, and was a mere stub before he or she got to work on it. Given the small number of editors who work on these articles, a large article reflects an interest by some editor who put in a lot of work to make it that way, rather than genuine notability. A small article is the result of mere chance.

I shall publish some of these charts in subsequent posts.

Thursday, September 01, 2011

Masks at the masked ball

There is an interesting discussion here between some of Wikipedia’s dwindling number of competent editors. The complaint is the usual one about the Wikimedia foundation's obsession with the overall number of editors, rather than the quality of work that editors produce. “Certain people associated with the Foundation have been saying for years that it doesn't matter who makes the edits; we are all just masks at the masked ball, and what matters is numbers alone. I suspect they'll start to see the folly of that position, though it may take a few years".

I suspect the Foundation will not see the folly of that position, because of its deep-rooted faith in crowdsourcing. The official teaching of the church of Wikipedia is that editors are easily replaceable units of work, each of whose contributions are equally valuable. Supposedly, large numbers of small edits will, over time, make an article 'drift' towards quality and accuracy, even if each individual edit only improves the article imperceptibly.

But as one of the editors comments (and I am certain she is right) “when the history of Wikipedia is written, we're going to be astonished by the small number of people who created and maintained it”. Another agrees that “crowd-sourcing is largely irrelevant, as most articles are edited by very few editors, often only two or three, which is hardly a crowd, and almost all of the content often comes from only one or two editors”.

This is almost certainly true. Good articles are written by a handful of good editors. The same is true of bad articles, which are nearly always the result of a single spectularly incompetent and inept author writing a personal essay about a favourite subject. See e..g Intellectual history (“The concept of the intellectual is relatively recent”) almost entirely written by this chap. Or how about the really awful History of Europe, quite an important subject, you would think, and deserving of thoughtful treatment, but contains such gems as “During this time many Lords and Nobles ruled the church. The Monks of Cluny worked hard to establish a church where there were no Lords or Nobles ruling it. They succeeded”.

Nor does the crowd always pick up even the easily correctable mistakes. In May 2005, someone claimed that Belgian businessman Georges Jacobs (born in the late 1940s) was a commander of the Waffen SS's 28th SS Volunteer Grenadier Division Wallonien (presumably disbanded in 1945). The claim was removed only recently (August 2010), and so had been there 6 years without any of the ‘crowd’ noticing.

But then why are the masks complaining, while they continue to edit Wikipedia?  The whole experiment only appears to work while these poorly rewarded individuals fight a losing battle against a tidal wave of vandalism and illiteracy.  They need to stop fighting, get a life, and watch Wikipedia work out a solution for itself.  (And if Connolley is reading this, that means him too).

Friday, May 27, 2011

Can crowdsourcing make all articles excellent?

Ex-climate scientist William Connolley questions the logic of an earlier post, saying that

I'm not really sure exactly what claim about crowdsourcing you are calling false. If the claim is, crowdsourcing makes all wikipedia articles excellent, then it is trivially false. If the claim is, crowdsourcing is capable of creating excellent articles, then it is trivially true. Probably you mean something else, but what?
Well, as to exactly what claim of crowdsourcing I was originally calling false, in this post I cited the articles on Durandus  and Roscellinus as evidence against the claim that crowdsourcing makes Wikipedia "instantly responsive to new developments". The fact that these articles are entirely plagiarised or 'copied' from the 1913 Catholic Enyclopedia and Britannica 1911 suggests that Wikipedia is somewhat sluggish on the 'new development' front.

On the other possible claims that William mentions, I disagree that "crowdsourcing makes all wikipedia articles excellent" is trivially false. Since we agree it is false, it follows that crowdsourcing fails to make certain articles excellent, and it is an interesting, and therefore non-trivial, question whether there are certain types of article, or certain types of information that crowdsourcing fails to make excellent, and if so why.

I don't propose to answer these non-trivial questions here - I merely point out that they are obviously non-trivial. I have made suggestions in the past.  I suggested that "crowdsourcers are typically shy of deleting material, so articles tend to grow to the point of being unreadable. Second, they have no sense of where material ought to go. So the article tends to lose any basic thread it once had. Third, they have no sense of which facts to include, and which to leave out. What facts about Aristotle would you include in a three page article?". I have also observed that, as a general rules, Wikipedia's coverage of subjects like Boron and set theory is pretty good. On the arts and humanities it is a complete disaster. As Vaknin says, "they are replete with nonsense, plagiarism, falsities, and propaganda".

So it's an interesting question as to whether the poor quality of arts subjects is simply a matter of accident, and could have been the other way round. Another interesting (and therefore non-trivial) question is whether crowdsourcing is better at 'low culture' than 'high culture'. My view is that it is pretty good at articles like this, but really dreadful at articles like this. More later.

Wednesday, May 25, 2011

Plagiarism from old sources

I had a mild disagreement with climate scientist William Connolley here on my use of the term 'plagiarism'.  He objected that the articles in question did mention that they 'incorporate' material from the Catholic Enyclopedia. This is a matter of semantics. To my mind the word 'incorporates' suggests 'proper subset of' rather then 'equals', but let's pass over that. It matters in a very real way that the magic of crowdsourcing is little more than indiscriminate copying of the scholarship of a century ago.

The article Durandus misses all the medieval scholarship that happened from around 1913 until now.  In the case of medieval studies, that is quite a lot.  The discipline did not really get started until the nineteenth century, and much of the primary source material - the works themselves - did not become available until well into the twentieth century.  (Indeed, the work that Jack and I are currently translating did not become available in a critical edition until a few years ago, and has never been published in English). 

Thus the Wikipedia article necessarily misses some important facts.  For example, that Durandus was one of those assigned by John XXII to investigate Ockham’s nominalism.  Or the centrality of his work on the category of relation – first highlighted by Koch in 1927.  The article does not even mention the dates of Durandus’ work (the first Sentences commentary between 1303-8, the second between 1310-12) and omits to mention a number of his other works. As any scholar knows, assigning a date to a source is of crucial importance, and is a painful business.  Wikipedia, which is good at basic facts, and lists of things, would be an ideal source of such information.  But it isn't.

You will say I am knocking Wikipedia again.  But the point underlying this is not to knock Wikipedia, but rather a false claim about 'crowdsourcing'. We should be thanking the scholars of 100 years ago for Wikipedia, not the crowd (or Jimmy Wales, who is supposed to have invented the whole thing).

Wednesday, May 11, 2011

Derrida and Wikipedia

Someone has left on interesting note on Jimmy Wales Wikipedia talk page, about the article on the infamous French philosopher Jacques Derrida. I won’t copy the note as it is quite long and you can read it yourself if you follow the link. But I will summarise it here, as it captures well one of the fundamental problems of Wikipedia.

  1. This is a very important article about one of the most influential philosophers of the 20th century.
  2. It would be a great thing if Wikipedia had produced a good article on him.
  3. It hasn’t. The article is really really awful.
  4. Worse than that, if any competent person has actually improved it in the past, the article soon degrades (the commenter gives the current version and a the version from one year ago to prove this).
  5. Perhaps there is something fundamental in the very structure of Wikipedia itself that prevents it from reaching even basic levels of competence about topics such as this?
Having looked at both versions of the article, I tend to agree with him or her. I am not an expert on French philosophy. But the problem with the current article is not a matter of philosophical expertise, but of communicating some difficult ideas on a broad subject in a small amount of space, to a reader who has no expertise in or knowledge of the subject.

The current version shows all the typical weaknesses of crowdsourcing. First, crowdsourcers are typically shy of deleting material, so articles tend to grow to the point of being unreadable. Second, they have no sense of where material ought to go. So the article tends to lose any basic thread it once had. Third, they have no sense of which facts to include, and which to leave out. What facts about Aristotle would you include in a three page article? They don’t know this, so the article tends to move awkwardly from wide, sweeping, often 1066-ish statements about the world and the universe, to what football team the subject supported, or what he had for breakfast in March 1969.

Tuesday, March 22, 2011

Crowdsourcing again

A problem that I discussed earlier has now been escalated to Wikipedia’s ‘Arbitration Committee’. An editor called 'Jagged 85' had been systematically falsifying material in Wikipedia since he (or she) joined in 2005. The editor had a clear and consistent anti-Western agenda, systematically distorting source material in a way that untruthfully promoted Islamic and other non-Western intellectual achievements, usually by claiming that a scientific developments or invention or discovery was made or anticipated by some non-Western philosopher or scientist. A large amount of material was affected in Wikipedia, which is widely used as a reference work by millions of people, who trust it as a reliable source. The editor contributed to 8,115 pages, making 63,298 edits. Much of the problematic material seems still to be there.

The ‘case for the prosecution’ cites another example of this style of editing. The article “List of inventions in medieval Islam” contains the following assertion:

Central heating through underfloor pipes: The hypocaust heating system used by
the Romans continued to be in use around the Mediterranean region during late
Antiquity and by the Umayyad caliphate. By the 12th century, Muslim engineers in
Syria introduced an improved central heating system, where heat travelled
through underfloor pipes from the furnace room, rather than through a hypocaust.
This central heating system was widely used in bath-houses throughout the
medieval Islamic world.
The claim is cited, but the cited author Hugh N. Kennedy however writes something rather different:

In one respect, however, the early Islamic bath had more in common with the
classical one than with the later Islamic. Late antique and Umayyad bath
builders continued to use the hypocaust, though on a reduced scale, for
heating the hot chamber, whereas later Muslim baths used a simpler
system of underfloor pipes from the furnace room.
One of the arbitrators expressed surprise at the request, having been under the mistaken impression that Jagged 85 had been banned. “Who could have known that someone could get away with such behaviour on Wikipedia with only a single 24 hr edit-warring block”, he says. Yet there seems little chance that the committee will do something about the problem. Its terms of reference do not include ‘content dispute’. And there seem as many friends and supporters of the disputed editor as there are people who are concerned about the situation. On Wikipedia, which sources material from the crowd, anyone’s view counts the same as any other. Wikipedia: the encyclopedia that anyone with an agenda can edit.

Tuesday, November 09, 2010

Wikipedia admits plagiarism

Yesterday's post saw a burst of traffic from Wikipedia, of all places. The fuss over this particular incident will down, but there are two important principles to be gathered from this.

The first is that most of the people now on the project do not possess the skills to develop an accurate and comprehensive reference work. As we just saw, a senior person on the project has defended his outright plagiarism by admitting he cannot write. Another of his colleague agrees, making the extraordinary statement that "plagiarism … is not only rampant, it is a standard editorial practice throughout the project". Crowdsourcing doesn't work.

The second is the lengths the adminstration will go to stifle any legimitimate criticism of the project. An editor (not me) who raised an earlier alarm about plagiarism was promptly blocked by the same person who said that plagiarism is now standard practice. The edit history of the plagiarist has been erased - if you click on the links in my previous post you will see they no longer work. Any discussion of the incident has been stifled. A long-term content contributor complained about this and was promptly blocked as well . More details here.

Of course, all organisations have the tendency to stifle debate and legitimate criticism. But Wikipedia takes this to extremes, which is especially ironic given its commitment to 'free culture' and 'open source'. Or is it? When being free and open really means stealing, it's difficult to be free and open, isn't it?

Thursday, November 04, 2010

The best way for a pressure group to spend its time

"The more I think about it, the more it occurs to me that this [Wikipedia] is the best way for a pressure group to spend its time" Daniel Hannan. I pointed out as much here.

Saturday, July 17, 2010

Ochlocracy

Ochlocracy. It means 'mob rule' or perhaps 'crowdsourcing'. I'm going to write more about this later, having recently revisited Gustave le Bon's wonderful The Crowd.

Meanwhile here is what one philosopher said about the wisdom of the crowd.

"Philosophy: I'm a philosopher; why don't I edit the article on my subject? Because it's hopeless. I've tried at various times, and each time have given up in depressed disgust. Philosophy seems to attract aggressive zealots who know a little (often a very little), who lack understanding of key concepts, terms, etc., and who attempt to take over the article (and its Talk page) with rambling, ground-shifting, often barely comprehensible rants against those who disagree with them. Life's too short. I just tell my students and anyone else I know not to read the Wikipedia article except for a laugh. It's one of those areas where the ochlocratic nature of Wikipedia really comes a cropper".

By Wikipedia editor Mel Etitis, who is a well-known philosopher in real life. He left Wikipedia shortly after this comment in 2007.

Friday, July 16, 2010

Truth in numbers?

In my previous posts on truth [1], [2], I explored the idea that truth has no special benefit over falsity. "It is a piece of idle sentimentality that truth, merely as truth, has any inherent power denied to error". Here, I shall suggest that it is worse than that. Truth has powerful enemies, and there are forces systematically favouring error. As befits a scientific investigation, I shall present the theory, followed by evidence (using the example of Wikipedia once more) supporting it.

First, the theory: Those who support the truth are in a large majority. Of 100,000 people, probably all but ten would like to see the truth. But their interest is only feeble. The remaining ten support false beliefs of various kinds. They probably think these beliefs are true, but they are false for all that. And their false belief is passionate and determined. It follows that, if anyone is allowed to publicise their belief, and if there is a moderate cost to publicising it, such as arguing about it or being involved in an 'edit war'*, the proponents of error will always be victorious. For they are passionate in their error, whereas the others are only feeble in their truth.

Now, the evidence. No sane or normal or reasonable person contributes to Wikipedia, and so all contributors fall into the following broad classes: deviant, aficionado, quack, activist, cultist, crank. The reasons for their persistent interest, together with Wikipedia article examples, are as follows. I list them in approximate order of the power that belongs to them.

1. First are the deviants, usually the sexual deviants. The pedophile lobby has long been an active force on Wikipedia. They are powerful because, like everyone else, they are passionate about their sexuality, seeing it as in some sense normal, and also because no reasonable person is likely to edit articles about pedophilia, 'pederasty' and so on. Hence monstrosities like this, which is indistinguishable from monstrosities like (WARNING: pedophile website - not safe for work) this. See also the article about the PPA 'Wikipedia Campaign' at Wikisposure.

2. Aficionados ('fans') are a large and diverse group on Wikipedia. They are mostly harmless devotees of obscure subjects like Japanese comic books or American science fiction TV. The error lies not so much in the uncritical approach to the subject as the undue weight given to a subject which is essentially ephemeral and unimportant and unencyclopedic. This is probably harmless, although not in the case of Ayn Rand, who I have discussed before. The coverage of her ideas is extensive in Wikipedia, and out of all proportion to her real importance in philosophy, which is practically nil, particularly outside the United States. Aficionados are powerful in Wikipedia because they are mostly viewed as harmless, and because there is no 'weighting' policy in the encyclopedia. Quite reverse: they even have a policy: Wikipedia is not paper, and so there is no practical limit to the number of topics it can cover, or the total amount of content. The practice of a normal reference work, which is to assign pages in rough proportion to the received importance of a subject, does not apply here. Thus the academically marginal Ayn Rand receives more coverage than Aristotle, the father of Western philosophy and easily the most important figure in the Western intellectual tradition. The article on his Sophistical Refutations, for example, is no more than a list of contents. Compare this in size and scope with any article on the nonsensical and philosophically illiterate work of Rand, e.g. this.

3. Quacks are peddlers of fake cures, bogus medicine and psychological theory. There is plenty of this on Wikipedia. Their interest is commercial rather than idealistic. The Wikipedia administration does attempt to weed out blatant commercial advertising, but it is also corrupt. The subject that touches me the most is the rubbishy and fraudulent Neurolinguistic Programming. These articles had support at high levels of the Wikipedia administration, and so quite a few more neutrally-minded editors (including myself) were banned for attempting to trim them. See also EMDR, or Ken Wilber, or this lot, gulp.

4. Cults are groups with strange beliefs who have an interest in publicising their existence, recruiting new members, and usually suppressing the more unpalatable facts about their financial statements and other irregularities. Too many of these to mention, but some of the more amusing include the Brahma Kumaris, who seem to enjoy some support among Wikipedians, at least judging from the way that those who opposed them are so regularly blocked, simply for saying stuff like this ("You and your other adherents have wasted too much time of too many people's lives ... never mind mentioning the broken families and suicides that litter your religion's history"). The Scientologists did not fare so well, as is well known (their IP is currently blocked), but that is only because a group of prominent Wikipedians dislike scientology - it has nothing to do with any self-governing mechanism that prevents cults from promoting their views on the global electronic reference work. See for example Prem Rawat, defended for years by a prominent Wikipedia administrator, although he eventually came to grief.

5. Activists are the supporters of extreme political movements. These are also numerous, but the force can be less strong with them because there are often equally extreme political movements who are bitterly opposed, and so they 'edit war' the talk pages of the articles concerned. See the archives of the articles on The Troubles, the thirty-year conflict in Northern Ireland, which was intellectually as violent as the actual troubles were physically bloody. Or Islamic Republic is the same as Arab Republic. However, the articles on Yugoslav communism, e.g. on Tito seem to have got by with their strange point of view unscathed. They do not reflect the views of more recent historians who view Tito as essentially a Stalinist, but rather adhere to and reflect the propaganda of the former Communist Party of Yugoslavia. The article about Tito is written in a child-like manner, reminiscent of Yugoslav primary school textbooks from the 1970s. This may be the result of the 'partisan' group of editors who control those articles, and the relative lack of interest from anyone else.

6. Cranks. These are individuals who have a passionate belief in some idea, theory or system that they developed on their own, and which has been rejected by the academic establishment. Naturally they turn to the encyclopedia which anyone can edit. Cranks have little power on Wikipedia, because unlike the rest they are not part of a larger group and can easily be picked off, and also there is no fraudulent secondary literature they can tap. Nonetheless they flourish in dark corners - I am glad to see that Boolean logic, which has nothing to do with George Boole, still survives, and that the Ancient Egyptian Race theory seems to be thriving, at least between periods of illness or coma. And how about this whole category of stuff which is completely deranged (" the Mental Plane is located between, and hence is intermediate between, the astral plane below and the higher spiritual realms of existence above").

Add to this the eccentric and perverse features of Wikipedia governance and software. First, the ability of any IP addresses to edit, which results in a tidal wave of crude vandalism every day. This in itself is not a problem, since there are about 500 administrators who instantly clear it up. The problem is rather the administrators, many of whom are retired ex-military types or police, and who are not interested in encyclopedias at all. They just enjoy whacking vandals. The problem is that they don't really distinguish between vandals and ordinary editors, and don't really understand the disputes at all. They have no theoretical interest in a dispute between a scientist and a homeopathist or chiropractor. But if they see either of them getting out of line, i.e. infringing Wikipedia's strict 'civility' rules, they whack them anyway.

Second is the fact that accounts are anonymous, and there is not even elementary identity checking. Thus a problem editor who has been 'blocked' can instantly register again. Furthermore, they can create multiple accounts ('sockpuppets' or 'socks') to give the illusion of strong support for error. This frustrates the supporters of truth, many of whom stoop to the same tactics. Worse, Wikipedia has evolved an elaborate secret police ('checkusers') whose job is to spy on accounts for 'sock'-like behaviour and block them if necessary. This deflects the administration from the real job of building a comprehensive and accurate reference work, moreover it encourages types who are intellectually unsuited for such work.

Which is, of course, why Wikipedia is nothing like a comprehensive and accurate reference work. How would we change this? Well, the theory (that small numbers of passionate devotees of error will always defeat an army of those with a feeble and weak interest in the truth) suggests abandoning the idea that 'anyone can edit'. It's fine if only 'disinterested' persons can edit, but as the different but connected meanings of 'interest' suggest, it is difficult to get disinterested people to do this. They simply aren't, er, interested. Another idea would be this. Levy a small 'falsity tax' on everyone. A majority of people would vote for this, if the tax were proportionate to their feeble dedication to the truth. The money from the tax would then pay for experts with a proven neutrality and lack of 'interest' to write articles. This is essentially the economic model of a university, a system invented in the Middle Ages, and which therefore existed long before Web 2.0.

*Edit war: long protracted dispute on Wikipedia. Sometimes between the forces of truth and error, more often between propopents of different kinds of error (such as environmentalists and oil company employees).

Sunday, June 20, 2010

The Fragility of Truth

Still musing on Wikipedia, I am revisiting Mill's On Liberty which has a fascinating insight into the nature of truth:

But, indeed, the dictum that truth always triumphs over persecution, is one of those pleasant falsehoods which men repeat after one another till they pass into commonplaces, but which all experience refutes. History teems with instances of truth put down by persecution. If not suppressed forever, it may be thrown back for centuries. [...] It is a piece of idle sentimentality that truth, merely as truth, has any inherent power denied to error, of prevailing against the dungeon and the stake. Men are not more zealous for truth than they often are for error, and a sufficient application of legal or even of social penalties will generally succeed in stopping the propagation of either. The real advantage which truth has, consists in this, that when an opinion is true, it may be extinguished once, twice, or many times, but in the course of ages there will generally be found persons to rediscover it, until some one of its reappearances falls on a time when from favourable circumstances it escapes persecution until it has made such head as to withstand all subsequent attempts to suppress it. (On Liberty, Chapter Two).
Quite right. The truth, being recognisable as true, will pop up again and again, and be suppressed again and again, until a favourable accident allows it to prevail and flourish. It is a piece of Wikipedia nonsense that the magic pixie-dust of 'crowdsourcing' will instantly drive out falsehood and all error. The crowd is often wrong. It is often right, of course, but even then it can rarely be bothered to go to Wikipedia to correct the error. Only the obsessive and the insane are likely to do that, and the truth is mostly lost on those.