Friday, November 05, 2010

Comparing the different hypotheses for Strepsiptera

Halictophagidae
I have been comparing the different phylogenetic hypotheses for the Strepsiptera. As always, it has been hell to get the trees from images back into a suitable format for topology comparison. I have used the program METATREE for comparing multiple trees. The tree of trees essentially shows the two main hypotheses "Strepsiptera sister to Diptera" supported by the studies of Whiting and Wheeler using 18S and 28S rDNA, and at the other end of the metatree the "Strepsiptera sister to Coleoptera" supported by morphological studies and recent molecular studies using nuclear genes. Other differences in the Holometabola topologies are also illustrated in the metatree such as the variable position of the Hymenoptera. Anyway, to me, it looks like the research community is reaching a consensus on the "Strepsiptera problem": Strepsiptera sister to Coleoptera. What do you think?
Friday, October 15, 2010

TreeRipper: towards a fully automated optical tree recognition software

Unfortunately my TreeRipper program has been rejected from BMC bioinformatics for now because there are too many delegate programs that the reviewers didn't manage to install successfully. So until I manage to find time to make a makefile that can deal with the installation on multiple platforms, I have put the manuscript on Nature precedings and you can find the code at google code. I think I am the first to attempt to fully automate the conversion of a tree image into something more useful for researchers and I hope that what I have done can be built upon and improved. I have attached to the code a set of images and tree files that might be useful for training and/or benchmarking future programs.
Wednesday, September 01, 2010

2nd UK RAD Sequencing Meeting: wrap-up #RADseq

The 2nd UK RAD sequencing meeting took place at the e-Science institute in Edinburgh. John Davey did a brilliant job of organizing the event and the e-Science is always a good venue for meetings and workshops.
The meeting was split into the morning session were we heard from the developers of RAD-sequencing and useful bioinformatic tools from John Davey: RADtools.
Dr William Cresko (Oregon) talked about the history of RAD tags and some recent applications in their lab which mainly center around understanding phenotypic evolution in the non-model three spined stickleback.
Dr Susan Bassham (Oregon) brought up quite a few practical issues on the molecular side of things. One that stuck to mind, is that it is much better to use restriction enzymes that will result in similar fragment sizes and smaller fragments are better so that the intensity of the dots on the Illumina cell are similar. Ideally, the GC content of your samples that you are planning to mix, should have similar GC-content, so mixing different species is probably not recommended.

Dr Paul Hohenlohe (Oregon) talked about cleaning up the data after sequencing. How to detect sampling biases: different tags, different alleles and sequencing errors. Other sources of error include PCR variance, polymorphisms at the RAD sites. All these need to be accounted for in the analyses. He has developed a maximum likelihood genotyping based on multinomial distribution of the reads. The sequencing error parameter is estimated independently for each site. His most recent paper goes into further detail.

Dr Simon Baxter (Cambridge) is using RAD-tags for gene discovery and linkage mapping in the Diamond-back moth. He is mainly interested in the evolution of resistance to pesticides.

Dr Maureen Liu (Nottingham) is using RAD-tag to determine the genetic mechanism for Left-Right chirality in the pond snail. Out of the 70,000 RAD-tags she got from cutting the 1.4Gb genome with the restriction enzyme SbfI, she found a subset of 19 that were linked to chirality. She also managed to link it to a specific gene but kept the name of the gene hush hush.

Dr Shapiro and Dr Justin Gerke (Princeton) used RAD-tag for a global survey of C. elegans. They found that some strains share nothing, others share everything but most share about 40% of their genome and 94% of the strain pairs analysed share one fragment. The fragments shared are also large. They suggest that this may be due to recent migration but selection, both background selection and positive sweep, may also play a role.

The remaining afternoon talks were about projects that were getting started and it was a good opportunity for the speakers to receive advice from the guys who had already used RAD-tag.
  • Developing RAD markers as a resource for plant breeding using the perennial ryegrass Lolium perenne. Dr Matt Hegarty (Aberystwyth)
  • Unearthing the functionally relevant genetic diversity from the earthworm
    genome. Dr Pete Kille (Cardiff)
  • Exploring the use of RAD markers in tree breeding programmes. Dr Pablo Fuentes Utrilla (Edinburgh)
  • RAD Sequencing for applied conservation genetics. Dr Rob Ogden (TRACE Network, Edinburgh)
  • Adaptive signi´Čücance and genetic basis of a balanced colour-
    polymorphism in Philaenus spumarius. Dr Octavio Paulo (Lisbon)
  • RAD genetic mapping of reproductive mode in tadpole shrimps. Tom Mathers (Hull)
  • The genetic architecture of a fundamental social trait. Dr Yannick Wurm (Lausanne)

All the talks were very honest about the difficulties of the approach. I really got the feeling that the community was working together to improve the technique and advise researchers on how to use the method.

Update: The talks are now available here.
Thursday, August 26, 2010

Google Voice, Skype or TalkTalk

This morning, when I logged into my Google account, I got a little pop-up ad saying that I had $0.1 Google Voice credit. Prices seem cheap and I want to see whether it will be worth me using GoogleVoice instead of Skype for my calls. I have already posted a comparison of VoipCheap, Skype and Gizmo here. It looks like Google have set themselves up in direct competition with Skype so I am going to try to work out when I would be better off using GoogleVoice.
We tend to call friends and family in Mexico, France, Norway and occasionally Spain.

Prices in pence per min incl. VAT
Mexico (Landline)6.56.4
Mexico (Mobile)1220.7
Mexico-Guadalajara (Landline)1.32.1
Mexico-Mexico City (Landline)1.31.4
Mexico-Mexico City (Mobile)12 ?20.7 ?
Mexico-Monterrey (Landline)1.31.4
France (Landline)1.31.4
France (Mobile)9.713.2
norway (Landline)1.31.4
norway (Mobile)1313.8
spain (Landline)1.31.4
spain (Mobile)1217.7

There's very little in it when comparing calls to landlines but Google Voice does seem to be cheaper when calling mobiles. I will check sound quality but unless it is way better than Skype, I will stick with the devil I know.
Friday, August 06, 2010

Getting Evernote to OCR for you

Evernote is a great app that lets you make notes, take photos and screenshots, bookmark webpage and syncs everything so that you can access it from any computer, iphone or ipad. What I love about it, is that when you take a screen shot, it uploads the image to its server and tries to OCR the text, even hand-written text. Although technically speaking, Evernote doesn't actually do OCR:
"Evernote's image processing technology is a bit different. We analyze the image to generate a set of possibilities for each word that we see in an image. Each possible interpretation is given a score.
For example, we may look at a word and decide that this word may either be "clue" or "due", and we can assign a score to each possibility. This set of scored possibilities is stored in our database for searching.

As a result, there isn't a simple text representation that you can use. Instead, you can search our database to find the image based on the different sets of possible interpretations for each word."

Well, there is a way to get to that OCRed text (or data interpretation). You can find it in the files in Metadata/com.evernote.Evernote. Just in case anybody was wondering.
Thursday, June 03, 2010

The long short communication of M. Stift

Marc Stift also know as Stifty by some, has recently left Glasgow for the sunnier climate of Portugal. Just this week, the (not so short) short communication that he worked on tirelessly, has been published. I kept on trying to convince him that it was important that people understood the statistics of small sample size and that he should include the simulation graphs that he generated for determining the statistical power when determining the inheritance in tetraploids (but he ignored me). This is probably for the best as the figures would probably just have lingered in the depths of the supplementary materials. I am still working on convincing him to release those figures which might end up in his first blog or in a second paper. I'll keep you informed!

Monday, May 31, 2010

Increasingly wondering what publishers do for science and research

It seems that researchers are increasingly using Open Acesss (OA) platforms like Arxiv or Nature Precedings to disseminate their work. Authors often submit their pre-print manuscript version to these sites or to in-house depositories, for example Enlighten at the University of Glasgow. Researchers are at least encouraged to do this by funding bodies and research institutions.
This often means that the article is in circulation before the publisher's nicely formatted version (see for example, Rod's Elsevier Grand Challenge Paper) and begs the question "What do the publishers do for research and the scientific process?".
Ten years ago, it was easy to see the role that publishers had, they disseminated your work by convincing libraries and individuals to subscribe to their journals. In this way, your research had a chance to be seen on a few library shelves across the world after a few months of format checking and page layout with the publisher. But now, the worldwide web does that for you and does it immediately!
O.K., publishers do play an important role in the review process. They make sure they get a famous and qualified editorial board who select good papers for review and choose good reviewers for the job. This increases the impact of the journal and so feeds back into the status of the journal and the publishers. But increasingly, this seems to be the only thing that publishers are providing and could be done by other institutions like universities.
It seems that publishers have caught onto the fact that things are changing fast and they need to do something about it. One solution is to enrich the readers experience of a paper if he reads it on the publishers website as opposed to the pre-print pdf version in these OA archives. This, I think, was the idea behind the Elsevier Grand Challenge and perhaps behind the PLoS Hub for Biodiversity . There is no doubt that we are in need of better ways of finding research and data with the ever increasing number of publications to keep up with.

Wednesday, April 21, 2010

A way to improve the commenting system on publications

I was chatting to Rod a few days ago about his visit to CalAcademy for a Plos markup meeting and our disucssion strayed onto the commenting system of publications like Plos and the Biomed Central Journals. I was saying to him that I had left a few comments both as an individual and as a member of the molecular ecology discussion group in the department (not that I can remember the journal or the narture of the comments anymore) and that I had never received a response or feedback from the authors of the manuscript. Additionally, it isn't possible for me to get a list of the comments that I have left. Rod suggested that really what these sites needed was a way
to credit readers and commentors of the articles in a similar way to Disqus. Disqus is great, because all the comments you make on a range of different blogs and sites can all be accessed at Disqus, you can manage them yourself, edit them and delete them. You are in control of what
you have written. You can also see all the responses to your comments. Additionally, you get credits from fellow commentors for the comments and ideas that you put forward. It would be great if you could have the same level of control over the comments you leave on manuscripts at Plos and Biomed Central.
So I wrote a quick email to the guys over at Biomed Central:

I was wondering whether there was a central page I could go to,
to see the comments I have left on manuscripts along with the responses to
those comments. If this doesn't exist, I think it should. Creating a system
of commenting like disqus might encourage more people to comment on articles.
It enables the commentor to keep track of their comments, receive feedback, get rated and thus gives an incentive to comment. You could just enable disqus on your site.

Kind regards,
Joseph Hughes

I got this in reply:

Dear Joseph Hughes
Thank you for contacting BioMed Central.
Readers' Comments are available to view on the right-hand menu of all published articles:
The link below will take you to an example article where a comment has been left, you can access the article by clicking the associated link in the right-hand menu:
If you have any questions please don't hesitate to contact me.

Best wishes
David Roman

Either I was unclear or they just don't get it! In any case, I am very pessimistic about the system of commenting getting any better in the near future.

Friday, February 05, 2010

The people behind the paper

I thought it might be interesting for readers to hear about how our recent paper came together and especially who the people are behind the names. A while back, samples of Halipeurus lice landed on Rod Page's desk. As a postdoc in his lab, I was charged with storing them in the freezer and entering the relevant data in the now defunct lousebase, although the data is now available on Google Docs.
Ruth Brown, a PhD student at the time working at the Zoological Institute London, had sent the samples, suggesting that it would be interesting to sequence them. The specimen came from the Trindade petrel (Pterodroma arminjoniana) on Round Island (near Mauritius) where she had been working for her thesis. The presence of this petrel on Round Island was probably a recent colonisation as there were no records of the petrel on the island prior to the 1960s.
I did not realize why it would be interesting to do this sequencing until I met Leandro Bugoni from Brazil. He was just finishing his PhD at the University of Glasgow with Bob Furness. He had been working on the Trindade petrel on Trindade Island (near Brazil). Interestingly, they had found little difference between these two island bird populations whether it was based on genetic markers, morphology or calls and yet they hosted different lice species.
We knew this thanks to the expert identification skills of Ricardo Palma based at the Museum of New Zealand Te Papa Tongarewa and whom I had collaborated with before on Pectinopygus lice on pelicans.
What we did not know, was how these two lice species were related, how divergent they were and their origin. Had they always been associated with the Trindade Petrel? Had they recently parasitized one of the bird populations?
Fortunately, a friendly Faroese MRes student, Sjudur Hammer, chose my suggested project proposal and did all the lab work. He copped very well in the lab and managed to get some good quality sequences for the Halipeurus lice from both islands.
And voila! We published what I think is a cool cospeciation study of lice and gadfly petrels and it was a pleasure to collaborate with such an international group of researchers.

Thursday, February 04, 2010

Image searching

I have been trying to find images on the web that will match images that I upload and I have come across a few cool websites.
To start with there is pixolu. Although it doesn't enable you to upload an image, it lets you search for particular key words and then select the types of images that you would like. Using these images, it then refines the search to find images similar to those that you have selected. I thought it was a very nifty tool. This is quite similar to Google Similar Image, except that it has a nicer interface and a lot of images in the Google Similar don't actually have a similar even though the same image in pixolu does.
Then there is retrievr which still doesn't let you upload an image but you can squiggle something in a box an it will find similar images on the web. I was impressed to find that when I did a very rough drawing of a tulip, the search did pull up an image of a tulip (see the screenshot).

But something like Gazopa was what I was really looking for. Unfortunately, it doesn't work as well as I would hope. Even though the image I upload is available on Flickr, it doesn't actually find it. It does do a good job at finding images with similar colours (see screenshot).

Thursday, January 14, 2010

Species Image ReCAPTCHA

After seeing the inspiring talk of Luis von Ahn on PopTech, I was thinking that CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) could be a cool way to improve the OCR of species names in books OCRed by the Biodiversity Heritage Library particularly as Rod is currently using ReCAPTCHA on BioStor. BioStor is helping to annotate and extract data from BHL and Rod is using ReCAPTCHA to check that the annotation is being made by a human.
In my limited experience of OCRing, I have found that italicized species names are particularly hard to get right, so if ReCAPTCHA could be set-up to to use species names from BHL, then users annotating BioStor would be improving the OCRing of BHL articles and helping to annotate it.
Along the same lines, I have been thinking that it would be great if you could use the power of the brain to annotate species images that are on the web, a bit in the style of Google's Image Labeler which I find addictive so have been avoiding it for a while. Then quite by chance, I came across this image CAPTCHA at the University of Edinburgh.
I thought this would be a really cool way to help tag wildlife images at least using common names. You could also do a pro version for taxonomists with latin binomial tagging.

