Ever since we first heard about MinION genetic sequencers we’ve been excited to see how we can use this technology at Science Practice. We recently managed to try it out for the first time.
Excited to use our #MinION to be #sequencing lambda phage with @LaurenCowley4 today! @nanopore pic.twitter.com/WPBkfutZS6
— Science Practice (@sciencepractice) 25 January 2016
MinION is a genetic sequencer that is small enough to fit in your pocket. Developed by Oxford Nanopore it uses a method called nanopore sequencing to read genetic bases in DNA in real time.
Our very own MinION: a pocket-sized, USB-powered genetic sequencer.
We first received our MinION last year but have had to go through several steps to get started. Firstly, hunting down a computer with very specific requirements (Windows 7 is becoming increasingly hard to find!), and then verifying our hardware and software setup by running a data exchange with a dummy (“configuration”) flow cell. The following screenshot shows uploading, processing and downloading 20 test data files with Metrichor, which is Nanopore’s cloud-based analysis tool.
Data exchange with Metrichor during configuration.
Only then could we order our flow cells and their accompanying reagents. These reagents need to be stored at -20 °C so we bought a household freezer hopeful that -18 °C would do the trick.
Inspired by stories and pictures of using MinION in exotic places, we were raring to go with our MinION and new laptop. However we soon realised that we would need a table-full of other lab equipment. Besides specialist equipment for quantifying DNA in a sample, even standard equipment like pipettes and vortexers add up to a basic equipment bill of around £5000, and facilities for storage and disposal are also necessary. This equipment list made the MinION significantly less portable than it first appeared.
To get started with MinION we decided to first trial the protocol in an established lab. Lauren Cowley from Public Health England had previously given a talk about her work with MinION at one of our Mobile Health Meetups, and she kindly agreed to show us the ropes.
When you first start using a MinION, Oxford Nanopore recommends running a standard control ‘burn-in’ experiment to familiarise yourself with the protocol, the equipment and what the data looks like. The burn-in involves sequencing a small viral genome called lambda phage, and comparing it against a reference to check that everything is working properly.
Over in Lauren’s lab at Public Health England, our first step in running the burn-in was to set up all the equipment and reagents we would need for the experiment. Cue a line up of pipettes, tubes for mixing, a magnetic rack, reagents on ice, and heating, mixing and measuring equipment.
Oxford Nanopore's sample preparation reagents for the burn-in experiment.
A genetic sample needs a few processing steps before it can be loaded onto the MinION for sequencing. This is known as ‘library preparation’ because a long DNA strand is broken up into a ‘library’ of DNA fragments with special sequences on their ends.
Fragmented DNA of around 8kb is recommended for the burn-in, as this allows important enzymes to be bound to the ends of each fragment, and also allows the DNA to efficiently pass through the pores of the flow cell. Fragmentation can be done in a number of ways, but for the purpose of this simple burn-in we relied on DNA’s natural fragmentation during processing.
The next step is to bind enzymes to each end of the DNA which facilitate the sequencing reactions. One enzyme is responsible for recognising the pore on a flow cell and directing the DNA fragment into it, while the other creates a ‘hairpin loop’ between the double strands of DNA. This means that after a single strand of DNA has been through a pore, its complement is pulled through afterwards, creating two reads of each fragment.
After each sample-prep step it is important to clean up the DNA and get rid of lingering chemicals from previous steps. The clean up step relies on tiny magnetic beads with enzymes that bind and unbind DNA under certain chemical conditions, which allows the separation of DNA from non-genetic material. In the clean up step, Eppendorfs containing the DNA sample mixed with magnetic beads are placed on a magnetic tube rack. The strong magnet concentrates and holds onto the magnetic beads and DNA, so that the rest of the fluid can be carefully pipetted off.
Lauren supervises careful pipetting, to avoid losing or damaging the DNA sample.
Because it was our first experiment, sample preparation took us about 2-3 hours, but it could be done in an hour.
With the library prepped, we were ready to connect our MinION to our laptop, open up the sequencing program called MinKNOW on our laptop, and load our sample into the MinION for sequencing.
The day wouldn’t have felt complete without some drama, so naturally the MinION and our laptop stubbornly refused to talk to each other at first. Once that was resolved, we found that MinKNOW needed an internet connection to start sequencing the lambda phage, but we were offline. Luckily the lab wasn’t too far underground and we were able to get just enough signal by leaning a mobile phone on the windowsill to create a wifi hotspot for our laptop.
The MinION itself is a chassis which houses (semi-) disposable cartridges called flow cells:
A freshly opened flow cell, before it is loaded into the MinION chassis.
Prepared DNA is loaded into a flow cell, where it flows over a membrane speckled with nanopores. Probably the trickiest part of the whole experiment is preventing any bubbles from flowing across the membrane. The flow cell also contains the electronics to read the electrical signatures of DNA bases as they pass through the pores.
Loading our DNA sample onto a flow cell in the MinION.
A nice feature of the MinKNOW software is that it displays the status of how well each pore is functioning. The high number of pores labelled green on screen told us that our flow cell was in great shape.
Boasting about our "sea of green". @LaurenCowley4 tells us this is very good. #seaofgreen #MinION @nanopore pic.twitter.com/yP7m6oUR6m
— Science Practice (@sciencepractice) January 25, 2016
As soon as we started the sequencing run in MinKNOW we could see the data coming in because the sequencing is in real-time. Our experiment ran for several hours overnight, but we did manage to disconnect our phone without interrupting it. The burn-in produced about 16,000 .fast5 files which amounts to 4GB. This is a pretty amazing amount of data considering that the 60kb genome we sequenced is considered tiny. This gives an idea of the amount of data that can be generated by genetic sequencing, and the special processing requirements that are necessary for handling it.
Nothing drives home #bigdata point like a folder of 16000 files from 1 small #genome sequence #SUCHBIGDATAMUCHWOW pic.twitter.com/txHPZGxMkB
— Science Practice (@sciencepractice) 28 January 2016
We are really excited to investigate how and where we can use MinION, especially as part of our focus on citizen science, resourceful engineering and bringing scientific analysis outside of the lab. Now that we’ve worked through the protocol in a fully equipped lab, we can see how it could be adapted for the field. Our excellent guide Lauren recently published a paper in Nature (Go Lauren!), describing how they used Minion to track Ebola in Guinea by packing all the required equipment into a flight suitcase. This gives us plenty of hope and inspiration to use the sequencer in difficult, and unusual contexts. For our next adventures in sequencing, watch this space!
We’re excited to announce our new collaboration with the Humanitarian Innovation Fund (HIF)!
The HIF are doing some great work in supporting and funding projects that improve the effective delivery of humanitarian assistance in emergencies. While their scope is broad, the project we are developing in collaboration is focusing on the water, sanitation and hygiene (WASH) sector.
The aim of the project is to identify impactful and tangible innovation opportunities in six key WASH challenge areas. These areas were chosen following a Gap Analysis the HIF carried out in 2013 and include surface water drainage, sanitation provision, solid waste management, water treatment, and handwashing promotion.
This is a particularly exciting project for us as in addition to refining our process for identifying good problems, we have also been exploring the potential of different kinds of interventions, such as hackathons or sandpits, to support the development of new ideas and encourage new collaborations.
The HIF commissioned experts in each challenge area to write Problem Exploration Reports that explain the problem, highlight current approaches and limitations of existing solutions, and suggest areas for further exploration. Our first task on the project was to edit, structure and design these reports as part of a coherent series. The reports are now available on the HIF website here.
Designed and edited WASH Problem Exploration Reports.
Our current role on the project is to identify the most timely and impactful innovation opportunities within each of these problem areas and design a series of WASH innovation challenges for the HIF to launch throughout 2016. We are also delighted to be working with the HIF on the design and planning of an innovation workshop around the first topic of Surface Water Drainage in Emergencies. More about that to come!
Over the past 12 months we’ve been working closely with Nesta’s Centre for Challenge Prizes to design a set of 10 prizes for the European Commission. Our scope was broad – identify technical challenges or problems that require technological solutions.
It’s been an intense year with over 100 expert interviews, several dozen topics reviewed and a lot of lessons learned along the way. While the topics of the prizes are still under wraps, we thought we’d share some of the key things we learned from this process of finding good problems and transforming them into ambitious prizes. Here goes:
New, potentially disruptive technologies are often clouded by technical language which may limit their wider adoption. A prize can be helpful to clearly articulate the technology, offer legitimacy and introduce it to different communities.
Designing a prize that would operate in a billion-dollar industry is tough because million-dollar incentives can easily get lost in billion-dollar oceans. But financial incentives are not the only benefits of a prize. A prize can also raise awareness of the shortcomings of well-established technologies and potentially motivate innovators to fix them.
At a completely different scale, we designed prizes for technologies that are currently at a very low level of maturity but could potentially revolutionise entire fields. A prize can be helpful as a way of defining a common list of success criteria that people could work towards, as well as raising awareness. This would allow for a fair comparison of solutions and support future development.
Prizes can also shed light on who is working on a problem and how much real progress they have made. For one of the prize topics the experts we consulted said that, for decades, innovators who claimed to have solved a particular problem had also failed to translate their discoveries into reliable, marketable solutions. What they did manage to do, however, was to create the impression that the problem was solved and therefore dissuaded others from attempting their own solutions. A prize in this area would have the opportunity to open up the challenge to a wider audience of problem-solvers, set a clear deadline, and require those working on solutions to be upfront about the status of their innovations.
Drawing a line under these findings, what stands out is the need to ensure that the problems we are trying to solve are good ones. By ‘good’ we mean genuine problems that are clearly defined, problems that address a real need, and that stand a reasonable chance of being solved with the resources at hand. We’ve written about this before, but finding ourselves face-to-face with this conclusion once again points to the fact that defining a good problem requires asking the right questions and no small amount of effort. We will be spending this year looking at ways to formalise this process of finding good problems. Stay tuned!
After a lot of work on our Sequence Bundles visualisation method, we’re very excited to launch the Sequence Bundles web-app. The web application has been designed and developed by Science Practice, built by Joe Lau and is hosted by the European Bioinformatics Institute at the Wellcome Trust Genome Campus in Hinxton, UK. The application is part of our Sequence Bundles project, which also includes a gallery of Sequence Bundles visualisations.
Now that we have reached this milestone, we though it would be interesting to review the process of designing and developing Sequence Bundles, so here is a short story of how the project unfolded.
Our Sequence Bundles project began in May 2013 when we entered the BioVis2013 redesign competition. The challenge was to come up with a new way of visualising multiple sequence alignments (MSAs) that would improve upon the long-standing sequence logo visualisation. The competition brief was well aligned with our own interests, so we decided to give it a go and prepare a submission.
First sketch-notes from our initial research into bioinformatics and sequence logos.
We started by learning about bioinformatics and finding out what it is that bioinformaticians are trying to find in their genomic or proteomic data when visualising MSAs. Initial desk research helped us cover the basics (reading Ivan Erill’s well written entry-level article was very helpful in understanding how information content is represented in sequence logos), but it was not until we went out to talk to bioinformatics researchers that we got a good understanding of the challenges they face.
In summer 2013 we met with Nick Goldman and Roland Schwarz who work together at the Goldman group at the European Bioinformatics Institute (EBI). They helped us understand the questions they would be asking when visualising MSAs. We learned about residue correlations and sequence motifs. We then started sketching intensively to explore the ways in which this information can be distilled from an MSA and represented in a new visualisation.
Early sketches — exploring ways to visualise sequence motifs and correlations.
Early Sequence Bundles sketches.
After exploring various approaches and testing how they work in hand-drawn sketches, we found that representing individual sequences as continuous lines can be promising. Combining a number of such lines in a single image would make an easy-to-interpret overview of all sequences in the MSA. Most importantly, this approach allowed us to retain all information about sequence content in each position in the MSA, and thus to provide a salient display of hidden sequence patterns. This is how it works in principle:
At this point we named the project Sequence Bundles, which has proven to be a descriptive and clear name.
Moving on from sketching by hand to generating visualisations in code allowed us to scale up and find out whether Sequence Bundles would work with real-life bioinformatics data. In the end, most bioinformatics research involves many more sequences than one can comfortably draw by hand. We built our first proof-of-concept Sequence Bundles generator in Processing and used it to explore the method further.
Sequence Bundles proof-of-concept prototype built in Processing.
Experiments in various data layouts and sequence representations (using insulin MSAs as sample datasets).
We used the BioVis2013 competition ADK_Lid data as input to generate Sequence Bundles images with our proof-of-concept tool. For the design of our visualisation method, we were interested in exploring various visualisation layouts, curvatures of the bezier line, different line colouring and grouping schemes, best ways to represent gaps in MSAs, as well as meaningful arrangements of amino acid residues stored on the Y-axis. Even though our design exploration was aimed at producing engaging Sequence Bundles visualisations, by playing with the display of the data we began to observe a number of interesting features in the MSA that previously remained hidden. As we later learned, these features were also a surprise to the authors of that dataset.
We summarised our findings in our submission paper for the BioVis2013 contest, and later in an article in BMC Proceedings, published in the journal’s special issue reporting on the results of the BioVis2013 redesign competition.
Later in 2013 we presented our Sequence Bundles visualisation method at the IEEEVis2013 conference in Atlanta, GA — one of the most prestigious global visualisation conferences — and won an award at the BioVis2013 redesign competition.
Design experiments that helped us discover hidden sequence features in the ADK_Lid BioVis2013 dataset.
Encouraged by the results of the BioVis2013 competition, we decided to pursue the development of Sequence Bundles further.
In 2014 we won Innovate UK funding to support the next stage of our Sequence Bundles project. Following the discussion presented in our BMC Proceedings paper, our goal was to develop a web-based interactive visual analytics tool for bioinformatics that would employ Sequence Bundles as its main data visualisation method. We teamed up with the Goldman group from the EBI once again to work on this project.
Between August and December 2014 we worked intensively on building a rendering engine that would generate customisable Sequence Bundles visualisations. In parallel to that we also tested Sequence Bundles with bioinformatics researchers and experts around the world in our user research programme. This allowed us to start ongoing conversations with several scientific groups in both academia and industry that do great work in diverse areas, from healthcare to agri-tech. Recently, we also exhibited Sequence Bundles publicly at ‘The Art of Networks II’ show, at the famous New York Hall of Science in NY.
Our project concludes with the publishing of this Sequence Bundles web visualisation tool. It is available to everyone who wants to use Sequence Bundles to visualise their MSAs that contain up to 1000 sequences and are no longer than 1000 positions per sequence. The web tool is dedicated for the visualisation of protein data sets; however, in principle, Sequence Bundles can be applied to genomic or — in fact — any other sequential data types. Our web tool allows the user to import their FASTA-formatted data or to explore eight different example datasets. The visualisation is customisable, enabling the user to select various layouts, colour, data density and gap rendering options. Sequence Bundles images can be downloaded as PNG graphics and used in publications. We are also curating a gallery of interesting Sequence Bundles examples, so please get in touch if you want to share your visualisations with other Sequence Bundles users.
Example protein visualisations generated with the Sequence Bundles web application.
Sequence Bundles will soon be available as a desktop application, with additional features and visualisation capabilities exceeding the 1000 sequences × 1000 positions limitations of an web-app. We will make a separate announcement about this. If you wish to be notified once the desktop application is released, please let us know via email.
We are very interested in learning how Sequence Bundles are used by the scientific community. Please share with us your comments, ideas, and feedback. As always, we are open for collaborations, so don’t hesitate to get in touch if you think we can work on something together.
It has been a busy few months here at SP and we thought we’d post this newsletter of sorts. Here’s what we’ve been up to:
Since January, we’ve been working on researching and designing 10 new Challenge Prizes. We’ll post a few more details about this project when we can make it more public. This has been keeping Ana very busy for most of the year so far. Thankfully we’ve had the support of two fantastic researchers. The first, Matteo Farinella, was with us for phase one of the project and we are continuing to work with Matteo in his capacity as an awesome illustrator. For phase two, we are very happy to have Chloe Ambery on board, an almost-graduate from Imperial College’s Science Communications MA programme.
Marek has been spending quite a bit of time on Sequence Bundles, working with Joe Lau to get a web-enabled version of the visualisation tool online. Marek (with James K) also hosted the industry panel at the BioJS conference where we met a great community of people who are collaborating to make the next generation of bioinformatics tools.
Still thinking about great speakers at #biojs15 meeting—thanks @GenomeAnalysis @repositiveio @BiojsLibrary @yannick__ pic.twitter.com/zDjYYItIGT
— Science Practice (@sciencepractice) July 14, 2015
Tempest has been working on an internal project to explore the potential of low-tech microfluidics for diagnostic applications and we’ve been having a lot of interesting discussions with a variety of researchers and companies. Tempest gave a talk about this project at a recent TEDx event.
James G has also donned the cheek microphone for a talk he gave about Science Practice at a recent event organised by the Knowledge Transfer Network.
.@James_Godwin_ @sciencepractice talks about citizen science + how citizens can help innovate @KTN_Creative #ND30yrs pic.twitter.com/gQClPHS6LS
— Dr. Mitra Memarzia (@mitra_m) July 3, 2015
Lastly, we’ve been accepted onto Oxford Nanopore’s MinION Access Programme (MAP), which is enormously exciting. We are awaiting some essential supplies before we can fire-up the diminutive MinION sequencer, but we are looking forward to seeing what it can do first-hand. More soon!