/Expertise/Life sciences/Genomics and sequencing 4.0/elPrep: smarter, faster DNA sequence analysis software

elPrep: smarter, faster DNA sequence analysis software

Streamline your genomic research with an integrated tool that excels in speed and accuracy.

If genome sequencing is an important part of your medical practice or research, time is all too often not on your side. After identification of the individual bases through sequencing hardware, hundreds of gigabytes of data need to be processed to reconstruct the DNA sequence and flag variants that might indicate genetic disorders.

It’s a procedure that typically involves a series of DNA sequence analysis software tools and takes a lot of time – hampering your research and delaying your results. Unless you speed up the process with elPrep, of which version 5 now includes support for variant calling.

We’re ready to assist you

Faster DNA sequence analysis

elPrep is a DNA sequence analysis software solution that’s up to sixteen times faster than other programs on similar computing hardware, all without using expensive GPU or FPGA acceleration The reason for this remarkable increase of speed? A smart software architecture that:

  • combines the processing of multiple genome sequencing preparation steps and parallelizes their execution
  • optimizes memory management
  • minimizes the number of I/O accesses
DNA sequence analysis software runtime/disk use benchmark

WGS Benchmark*. Runtime, peak RAM, and disk use in GATK 4 (colored) vs. elPrep 5 (grey). The runtime/resource use for GATK 4 are shown per step in the pipeline, whereas all steps are combined into a single data point for elPrep 5. The GATK 4 runs were executed for both versions of the haplotype caller algorithm it implements. In comparison to GATK 4, elPrep 5 executes the pipeline 8.5-16x faster using +- 0.70x of the RAM and +- 0.70x of the disk space GATK4® uses. The outputs of elPrep are identical to the GATK outputs. (*50x NA12878 Illumina Platinum genome, hg38, run on AWS m5.24xlarge, Intel Xeon, 96 vCPU, 384 GiB RAM)

Comprehensive DNA sequence analysis software

elPrep is developed by ExaScience Life Lab, a division of imec that focuses on scalable software solutions for data-intensive and high-performance computing problems, primarily in life sciences. Thanks to this expertise, elPrep is a tool that produces results like established state-of-the-art genome analysis programs such as, amongst others, SAMtools, Picard and GATK4®.

Moreover, elPrep seamlessly replaces all these other tools, including variant calling. Giving you a single, ultra-fast solution for a large part of the DNA sequence analysis process.

DNA sequence analysis software runtime/cost benchmark

Figure 2 AWS WGS Scaling Benchmark**. The graph shows the runtime (left) and the dollar cost (right) for running the variant calling pipeline on a variety of AWS server instances. The fastest elPrep run is more than 8x faster for roughly the same prices as GATK. Concretely, elPrep processes the WGS sample < 6 hours for +- 32 dollars. (**M5.2xlarge: 8 vCPU, 32 GiB, 046$/hour. M5.16xlarge: 64 vCPU, 256 GiB, 3.68$/hour. M5.24xlarge: 96vCPU, 384 GiB, 5.52$/hour (September 2020 prices for EU Frankfurt))

Want to use elPrep?

elPrep is offered by imec under dual licensing, namely AGPL v3 and elPrep Premium License. Go to our terms of use page for detailed information .

elPrep is distributed as a single binary that incorporates all of its functionality and is easy to use and install. It’s written in Go, an open-source program language. It doesn't require GPU or FPGA accelerators and can therefore run on any standard server on-premise or in the cloud.

The elPrep source code is freely available on GitHub and various publications describe its behavior in full detail and compare it to other tools.

Need customization or support? Don’t hesitate to contact us.

Discover elPrep Premium

Publications