Beyond Necessity: errors in wikipedia

Showing posts with label errors in wikipedia. Show all posts

Saturday, May 26, 2012

Schools Wikipedia

Jon Davies (Chief Executive of Wikimedia UK) gave me a 'Schools Wikipedia' CD which I gratefully accepted along with a Wikimedia UK coffee mug, which I use for my early morning mug (thank you Jon!).

I didn't look at the disc until today, fearing what horrors there might be, but actually it is quite good. It has clearly been edited to remove the worst of the grammatical abominations. The pictures are a more sensible size so there is none of that ugly white space, and, best of all, no footnotes except where necessary. The links to the very worst article have been removed.

Yet there is one more thing of great significance. When I checked the Age of Enlightenment article, it was much better than the version of the article I criticised here. For example, I criticised the current article (permalink) as stating that the Age of Enlightenment was a movement when, as its name suggests, it is a period. The schools article, by contrast, correctly states that it is a period after all.

Sigh of relief that the corruption of our schoolchildren is not imminent, at least in this case. But why the difference? Had the mistakes been edited out by professionals? Well, probably not. A bit of research shows that the school version dates from around October 2005. Which bears out what I have always said. A lot of reasonably good people were contributing to Wikipedia around that time, then left after a wave of vandalism and trolling hit the project in 2006. The vandalism was like 'drinking water from a firehose'. This was countered by a massive increase in the number of vandal fighters, but at the same time and by that very token the project was turning from building into an encyclopedia, which requires certain skills, to protecting against vandals, as well as running a secret intelligence force that would have made Stalin proud, and this requires different skills.

Very significant that the quality of Wikipedia has got demonstrably worse over 2005-12.

Monday, March 19, 2012

Why Wikipedia's Fans Shouldn't Gloat

A wonderful wonderful article in The Atlantic. It speaks for itself, no further comment required.

Sunday, March 18, 2012

The 'one ring' theory of cranks

I'm still working on the Wikipedia book, looking today at the history of the 'three revert rule', introduced in November 2004. This says that “An editor must not perform more than three reverts on a single page within a 24-hour period”, theoretically puts an end to the endless edit/revert cycle between warring editors. In theory, it is a numbers game which gives the edge to subject matter experts. As I understand the rule (I'm sure Belette will correct me if I am wrong) it applies per player, not per group of players. Two players working together can revert a single opponent up to six times a day, but the opponent, if on their own, can revert only three times. Thus whoever has the largest number on their side, wins the edit war.

This seems to give the edge to subject matter experts because generally (though not always) they tend to agree, at least on the kind of slightly out of date knowledge that is appropriate for reference works. Cranks, by contrast, have their own theory of everything which is peculiar to itself and inconsistent with every other crank theory of everything. The theory that the earth is flat and the theory that it is a cube are both opposed to the mainstream theory that it is roughly spherical, but they contradict each other too. Likewise for the theories that the moon is made of cheese, and that it is made of candy floss. Thus 3 experts can beat any number of cranks, so long as the cranks don't agree on anything.

However, neurologist Steven Novella makes an insightful observation here that brings this idea into question.

... cranks around the world have been able to form their own “alternative” community, publish their own journals, and have their own meetings. There is just one requirement in this alternative community – acceptance. All ideas are accepted (there is no chaff, all is wheat), that is except for one. Whatever is accepted by mainstream science is wrong [my emphasis]. That is “the one ring” of crank mythology, that brings all crank theories together and in the darkness of their community binds them together. Otherwise they are largely mutually incompatible. Each crank’s “theory of everything” is a notion unto itself, and is mutually exclusive to every other crank’s own theory of everything (unless there is some incidental overlap). So they get together, present their theories without criticism, and all agree that the evil conspiracy of mainstream science must be taken down. Of course, if any of them got their way and their ideas became accepted, they would instantly become rejected by the rest of the crank community as mainstream physics.

Correct. My enemy's enemy is my friend, whatever my enemy believes. I have seen this effect in Wikipedia a number of times. Cranks unite to defeat the mainstream, orthodox view. Orthodox editors get blocked or banned. Cranks then war with each other, and get banned themselves. The orthodox editors mount appeals to the powers that be - the arbitration committee, none of whom have any expert credentials as far as I can see, and get unbanned. Or they just open 'sockpuppet' accounts and start editing again under a different name. So do the cranks, and the whole nightmare begins again. Another difficulty that Novella omits is 'mainstream' crankery. That is, bad science or quackery that unites its practitioners by financial interest. Homeopathy and 'Neurolinguistic programming' are good examples of this.

This would not matter at all, if Wikipedia were not increasingly used as a 'reliable source' by students, and even some medical researchers, as I noted in an earlier post.

Friday, March 02, 2012

The medical condition known as glucojasinogen

Here is the lovely example, possibly the best yet, of what I have called Wikipedia faction. This is where some nonsense information gets added to Wikipedia and stays there long enough for 'reliable sources' to pick it up, so that Wikipedia can then cite the reliable sources for the nonsense.

In 2007 an anonymous IP adds this entry to the Wikipedia article on diabetic neuropathy.

It is important to note that people with diabetes are more likely to develop symptoms relating to peripheral neuropathy as the excess glucose in the blood results in a condition known as Glucojasinogen. This condition is affiliated with erectile dysfunction and epigastric tenderness which in turn results in lack of blood flow to the peripheral intrapectine nerves which govern the movement of the arms and legs.

It's nonsense of course (we spent some time going through a medical dictionary to check). The nonsense then got picked up by two journals: "Influence of Murraya koenigii on experimental model of diabetes and progression of neuropathic pain" by S.V. Tembhurne and D.M. Sakarkar, Journal of Research in Pharmaceutical Sciences, 2010, and African Journals Online (which cites the same paper). It is now visible in Google scholar.

It would have been more amusing if the sources actually had been added, to complete the circle. But then it's not amusing at all, is it. It's one thing to get most of the key facts about medieval philosophers completely wrong. That just damages learning. It's another to slander someone in public, under the umbrella of a supposedly comprehensive and reliable reference work. That merely damages someone's reputation or even destroys their career. But getting important medical information wrong can damage someone's health or life. That's not amusing. In fact none of it is amusing.

The Wikipedians, by contrast, are having a bit of a laugh about it. Another Wikipedia hoax. The IP editor even got a special 'barnstar'. This is part of the frustration of the place. If something goes desperately wrong, it's a bit of a giggle. Challenge any of this from the outside, however, and you are immediately pointed in the direction of the famous 'Nature' article in 2005. "Wikipedia articles come close to the level of accuracy in Encyclopedia Britannica".

Sunday, February 26, 2012

Wikipedia article feedback

I have just been looking at the Wikipedia rating system. You go to the bottom of an article and click 'view ratings' to see what the crowd thinks of its trustworthiness, objectivity, completeness and quality of writing.

First of all, I don't understand the distinction between 'trustworthy' and 'objective'. Could an article be rated as objective, but utterly untrustworthy? Or lacking any kind of objectivity, but entirely trustworthy? In any case, I can make little sense of the results, when picking on articles that I know are poorly written, incomplete and untrustworthy, but which the system rates as well-written, complete etc.

For example, I've commented on the Ockham article many times, e.g. here. There are many errors and many omissions, and the quality of the writing essentially depends on your view of how easy it is to combine paragraphs from the Catholic Encyclopedia with paragraphs of drivel. How are the lay public supposed to judge on the completeness of coverage of a subject when the whole point of an encyclopedia is to inform them about it? How can they judge its objectivity? I agree that they might be able to judge the quality of the writing, but even this stinker, which has a quality template slapped on it, doesn't score that badly.

Interestingly this one, on Roscellinus, which I can see is a combination of the Catholic Encyclopedia and Britannica 1911, scores worse on quality of writing than the awful ones above. But it's quite well-written, although the style is somewhat antequated. Perhaps the reason is its use of much longer sentences and paragraphs. Perhaps the 4chan generation prefers articles with short paragraphs and short sentences. So perhaps we can blame the internet yet again, he says, sounding like an old fart aged 70 who reads the Daily Mail.

(I had a look at 4chan yesterday but that is a subject for another vituperative, old fart kind of post).

Saturday, February 18, 2012

Should you trust Wikipedia?

Sounds like one of mine, but actually one of the Maverick's, over here. Recommended, as always.

Tuesday, January 10, 2012

Wikipedia student assignments - what could go wrong

A reader has complained by email that there is too much medieval philosophy and not enough about Wikipedia. Well mate, I'm sure there are readers of this blog who appreciate the medieval philosophy and roll their eyes at the Wikipedia stuff. But, just to oblige, and I admit I have been a bit quiet on Wikipedia for the last few weeks:-

There is a fascinating study here of what happens when you let students loose on Wikipedia. A sample of students were invited to make edits to psychology articles, with depressing results. The supervising editors found that students found it difficult to write proper citations, despite being trained for academic writing. They did not understand the subject well enough to write for the average reader of an encyclopedia, and even made mistakes that even a non-expert could spot. Because of their reliance on a single source, it was difficult for them to paraphrase the source without making mistakes or writing nonsense, and so frequently, the text was incomprehensible.

I'll let you read the article and decide for yourselves. But there were a few points hidden in there. The first was how bad the students were at writing in a way that generalists could easily understand. Indeed, it was the Wikipedian mentors, who were not experts in the subject, who were much better at this. I'm not surprised. Writing for a middlebrow audience is difficult, and 'accessibility' is one area where I would fault the otherwise excellent Stanford Encyclopedia of Philosophy. (As well as Wikipedia of course, which fails to be accessible in many interesting ways).

The second was that, as the mentors observe at the end of their paper, the reason that the students' poor edits were reverted was because of the care and attention paid to the articles. They say that the fact that the plagiarism and poor content was reverted or fixed "is almost wholly down to the extraordinary efforts of three Wikipedians. ", and mention that even on popular subjects, the actual number of committed Wikipedians able to police edits is generally over-estimated. This suggests that a lot of poor edits are getting through without being reverted, which doesn't surprise me either.

Tomorrow: Thomas Aquinas.

Monday, November 28, 2011

Wikipedia: major studies detect cooling over Antartica

On 26 April 2009 – that is, two and a half years ago – an account called ‘Ivanelo’ added some material to the Wikipedia article on ‘Climate of Antarctica’, including the claim that “Since mid 1960s, all major studies detect cooling over the most of Antarctica”.

Today (28 November 2011), climate expert William Connolley reverts the edit, with the comment “rm twaddle (how did that stand for so long)”. Quite. On the assumption that the claim is twaddle, how was it not spotted for so long? Particularly as Connolley himself had worked on it interim.

This is something to store up for my discussion with the UK charity commission. Wikimedia UK managed to persuade them – see the discussion here - that processes exist on Wikipedia to ensure high standards of article quality. Their solicitors, Stone King, certainly assured them that “the content promoted has sufficient editorial controls and safeguards on the accuracy and objectivity of the information provided”. This is highly questionable. Connolley is one of a few subject matter experts who understand how to make judgments about their area of specialism. But there are not many like him, and the whole process of Wikipedia governance is inimical to such specialism. Connolley himself would like the Arbitration Committee of Wikipedia "to think more about content and less about conduct" - see his election guide here - but he knows that will not happen.

Thursday, September 01, 2011

Masks at the masked ball

There is an interesting discussion here between some of Wikipedia’s dwindling number of competent editors. The complaint is the usual one about the Wikimedia foundation's obsession with the overall number of editors, rather than the quality of work that editors produce. “Certain people associated with the Foundation have been saying for years that it doesn't matter who makes the edits; we are all just masks at the masked ball, and what matters is numbers alone. I suspect they'll start to see the folly of that position, though it may take a few years".

I suspect the Foundation will not see the folly of that position, because of its deep-rooted faith in crowdsourcing. The official teaching of the church of Wikipedia is that editors are easily replaceable units of work, each of whose contributions are equally valuable. Supposedly, large numbers of small edits will, over time, make an article 'drift' towards quality and accuracy, even if each individual edit only improves the article imperceptibly.

But as one of the editors comments (and I am certain she is right) “when the history of Wikipedia is written, we're going to be astonished by the small number of people who created and maintained it”. Another agrees that “crowd-sourcing is largely irrelevant, as most articles are edited by very few editors, often only two or three, which is hardly a crowd, and almost all of the content often comes from only one or two editors”.

This is almost certainly true. Good articles are written by a handful of good editors. The same is true of bad articles, which are nearly always the result of a single spectularly incompetent and inept author writing a personal essay about a favourite subject. See e..g Intellectual history (“The concept of the intellectual is relatively recent”) almost entirely written by this chap. Or how about the really awful History of Europe, quite an important subject, you would think, and deserving of thoughtful treatment, but contains such gems as “During this time many Lords and Nobles ruled the church. The Monks of Cluny worked hard to establish a church where there were no Lords or Nobles ruling it. They succeeded”.

Nor does the crowd always pick up even the easily correctable mistakes. In May 2005, someone claimed that Belgian businessman Georges Jacobs (born in the late 1940s) was a commander of the Waffen SS's 28th SS Volunteer Grenadier Division Wallonien (presumably disbanded in 1945). The claim was removed only recently (August 2010), and so had been there 6 years without any of the ‘crowd’ noticing.

But then why are the masks complaining, while they continue to edit Wikipedia? The whole experiment only appears to work while these poorly rewarded individuals fight a losing battle against a tidal wave of vandalism and illiteracy. They need to stop fighting, get a life, and watch Wikipedia work out a solution for itself. (And if Connolley is reading this, that means him too).

Friday, May 27, 2011

Can crowdsourcing make all articles excellent?

Ex-climate scientist William Connolley questions the logic of an earlier post, saying that

I'm not really sure exactly what claim about crowdsourcing you are calling false. If the claim is, crowdsourcing makes all wikipedia articles excellent, then it is trivially false. If the claim is, crowdsourcing is capable of creating excellent articles, then it is trivially true. Probably you mean something else, but what?

Well, as to exactly what claim of crowdsourcing I was originally calling false, in this post I cited the articles on Durandus and Roscellinus as evidence against the claim that crowdsourcing makes Wikipedia "instantly responsive to new developments". The fact that these articles are entirely plagiarised or 'copied' from the 1913 Catholic Enyclopedia and Britannica 1911 suggests that Wikipedia is somewhat sluggish on the 'new development' front.

On the other possible claims that William mentions, I disagree that "crowdsourcing makes all wikipedia articles excellent" is trivially false. Since we agree it is false, it follows that crowdsourcing fails to make certain articles excellent, and it is an interesting, and therefore non-trivial, question whether there are certain types of article, or certain types of information that crowdsourcing fails to make excellent, and if so why.

I don't propose to answer these non-trivial questions here - I merely point out that they are obviously non-trivial. I have made suggestions in the past. I suggested that "crowdsourcers are typically shy of deleting material, so articles tend to grow to the point of being unreadable. Second, they have no sense of where material ought to go. So the article tends to lose any basic thread it once had. Third, they have no sense of which facts to include, and which to leave out. What facts about Aristotle would you include in a three page article?". I have also observed that, as a general rules, Wikipedia's coverage of subjects like Boron and set theory is pretty good. On the arts and humanities it is a complete disaster. As Vaknin says, "they are replete with nonsense, plagiarism, falsities, and propaganda".

So it's an interesting question as to whether the poor quality of arts subjects is simply a matter of accident, and could have been the other way round. Another interesting (and therefore non-trivial) question is whether crowdsourcing is better at 'low culture' than 'high culture'. My view is that it is pretty good at articles like this, but really dreadful at articles like this. More later.

Wednesday, May 11, 2011

Derrida and Wikipedia

Someone has left on interesting note on Jimmy Wales Wikipedia talk page, about the article on the infamous French philosopher Jacques Derrida. I won’t copy the note as it is quite long and you can read it yourself if you follow the link. But I will summarise it here, as it captures well one of the fundamental problems of Wikipedia.

This is a very important article about one of the most influential philosophers of the 20th century.
It would be a great thing if Wikipedia had produced a good article on him.
It hasn’t. The article is really really awful.
Worse than that, if any competent person has actually improved it in the past, the article soon degrades (the commenter gives the current version and a the version from one year ago to prove this).
Perhaps there is something fundamental in the very structure of Wikipedia itself that prevents it from reaching even basic levels of competence about topics such as this?

Having looked at both versions of the article, I tend to agree with him or her. I am not an expert on French philosophy. But the problem with the current article is not a matter of philosophical expertise, but of communicating some difficult ideas on a broad subject in a small amount of space, to a reader who has no expertise in or knowledge of the subject.

The current version shows all the typical weaknesses of crowdsourcing. First, crowdsourcers are typically shy of deleting material, so articles tend to grow to the point of being unreadable. Second, they have no sense of where material ought to go. So the article tends to lose any basic thread it once had. Third, they have no sense of which facts to include, and which to leave out. What facts about Aristotle would you include in a three page article? They don’t know this, so the article tends to move awkwardly from wide, sweeping, often 1066-ish statements about the world and the universe, to what football team the subject supported, or what he had for breakfast in March 1969.

Saturday, May 07, 2011

Wikipedia and Truth in Fiction

I have commented more than once about erroneous information in Wikipedia becoming what Wikipedia calls a ‘reliable source’, and so turning a previously unreliable source into a reliable one. Here is another fascinating example, reported in the Telegraph today.

It began with alterations to the online Wikipedia entry of the art dealer Philip Mould, by some anonymous contributor questioning the importance of “discoveries” and suggesting other dealers had made far more important finds. Then, in October 2009, the same person sent a “press release” to national newspapers, falsely claiming Mould was having an affair with Charlotte Barton, a 42-year-old artist.

These were complete fabrications. The problem for Mould was how to remove the entries. The slanderous allegations were now in the tabloid press, and Wikipedia could now substantiate the same unsourced allegations with 'reliable sources' (yes, Wikipedia does treat tabloids as 'reliable sources' as most Wikipedian editors do not have access to proper libraries, and rely on the internet almost exclusively). So Mould was forced (apparently) into an 'edit war' with Wikipedians who were determined to defend the source.

21:06, 11 October 2009 a Wikipedian called Trident13 adds a 'personal life' section.
08:51, 12 October 2009 Mould (or someone acting for him) removes it, with the comment 'unnecessary gossip deleted'.
10:10, 12 October 2009 some editor called Teapotgeorge adds the material back, with the comment "revert removal of referenced material by coi editor". Yes that's right. Philip Mould cannot remove this slander (a) because Wikipedians imagine the planted story is a reliable source and (b) even more hilariously the subject of the slander has a 'conflict of interest'.
17:24, 15 October 2009 Mould attempts to remove it again.
17:42, 15 October 2009 Teapotgeorge adds it back, commenting 'You have a conflict of interest please don't remove referenced material take to talk page'.
09:50, 16 October 2009 Mould removes once more.
12:32, 16 October 2009 Teapotgeorge thankfully gives up but moves the contested material to the 'talk page of the article. But it does not end there.
19:38, 9 December 2009 an anonymous IP address changes "The couple separated in May 2009, after Mould started an affair with artist Charlotte "Charlie" Barton." to "The couple separated in May 2009, after Mould started an affair with marriage wrecker Charlotte "Charlie" Barton." ...
19:40, 9 December 2009 ... then changes 'marriage wrecker' to 'notorious marriage wrecker'.
19:44, 9 December 2009 Teapotgeorge changes 'notorious marriage wrecker' back to 'artist', but comments that he is reverting good faith edits. Good faith?
00:51, 10 December 2009 The IP changes back to 'marriage wrecker'.
23:46, 6 May 2011 And there, incredibly, it stays for a year and a half, until yesterday when a senior Wikipedia adminstrator 'NewYorkBrad', who for once is not anonymous, being Ira Brad Matetsky of the law firm Ganfer & Shore, LLP in New York, removes it at last, commenting "Section removed. There is evidence of a deliberate plot to defame the subject of this article. For those investigating this misuse of Wikipedia, the content formerly here can be found in the page history".

Matetsky should know better than to have anything to do with Wikipedia, and should perhaps use his legal expertise and influence to prevent this sort of thing happening at all, but he did the right thing here. But the wider problem remains. The issue is how Wikipedia, which is a very reliable source on stuff like Boron and the orbit of Jupiter and the US road system, leverages its unquestioned reliability in the field of science onto the murky and poisonous world of the internet.

Wednesday, March 23, 2011

More fiction turns to fact

I reported here about erroneous information in Wikipedia being used in reference works, thus becoming what Wikipedia calls a ‘reliable source’, and so turning a previously unreliable source into a reliable one. There is another excellent example being discussed on William Connolley’s talk page. William corrects the absurd claim that Sharaf al-Din al-Tusi was the first to discover the derivative of cubic polynomials, but is immediately challenged by another Wikipedia editor, who says

You chose the wrong example. "In the 12th century, Sharaf al-Din al-Tusi found
algebraic and numerical solutions to cubic equations and was the first to
discover the derivative of cubic polynomials." is in the Encyclopedia
of Ancient Egypt at Google Books, and it took me just 0 seconds to found it
out. Please DO NOT DELETE contents because of your POV. Please, use inline
templates instead. Cheers. –pjoef 20:10, 18 March 2011 (UTC)

William points out that the source for this disputed statement - the Encyclopedia of Ancient Egypt which at first sight looks as reliable as you could get – had itself used Wikipedia as a source. He goes on to say that “the source is therefore clearly worthless as a citation to support statements made in Wikipedia”. Yes, although I question his ‘clearly’: it may have been obvious in this case, but as more secondary sources use Wikipedia as a primary source, the problem will continue, and will be increasingly difficult to spot.

Friday, March 04, 2011

Wikipedia: fiction becomes fact

A fascinating example of how erroneous information on Wikipedia leaks out into the web, turns into citable information that can then be recycled as verifiable back to Wikipedia. A Wikipedia editor in February spotted the following strange edit made in November 2002 – nearly 10 years ago - mentioning a scientist’s claim that the Van Allen belt is the result of volcanic activity. Which is of course nonsense. The editor Googled for this and got plenty of hits, and it nearly went unchallenged. However, suspecting that the age of the Wikipedia entry had caused this claim to become accepted fact, he emailed the scientist, who confirmed it was nonsense, and the edit was finally reverted on 18 February 2011.

The ensuing discussion is on the Wikipedia Astronomy project page.