Case study

Elastic Cloud Processing Engine for Genetic Data

A company that specializes in genome processing has hit a computing processing limit. They needed genome processing to be fully automated and put in a cloud.

We have built a scalable cloud-native and cloud-agnostic service for executing genome processing jobs

Business challenge

Our client is an innovative genome processing startup and the world’s largest DNA testing and analysis platform. The company aims to provide end users with a platform to store, use, and understand their genetic data easily. The platform consists of components that securely upload genetic data files of any size directly into the user’s account, a genome processing engine, and various APIs for third-party integration. To accelerate the growth and handle an increasing number of users, our client was looking for an established digital health technology enabler to build a high-performance, elastic cloud system for batch genome processing that automatically scaled out depending on demand.

Requirements to the system were well defined and required the target system to control computational resources in a fine-grained way per each genome processing job to differentiate between various business aspects. For instance, jobs from premium clients should be given more resources and prioritized. Our client required a trusted technology partner to provide top-notch software engineering services in the cloud and distributed system domain. 

The urgency of satisfying an increasing load resulted in tight implementation timeframes and the need for exceptional technical expertise to build a high-performance, resilient, and HIPAA-compliant data processing infrastructure, incorporating a series of digital solutions and services. As a result, our client turned to Plexteq as a proven health technology expert proficient in medical software development. After a small trial project that demonstrated our strong product development and delivery management capabilities, our client entered into an extensive, long-term strategic partnership with Plexteq.
 

Key challenges

1

Existing genome job processing on in-lab equipment was slow and inefficient

2

The existing process was semi-automated required manual intervention

3

Existing genome processing environment was not HIPAA compatible

Solution delivered

Genome processing is generally a computationally heavy process. Depending on the kind of processing, such a process may take many hours or even days of CPU time, memory, and I/O operations. Therefore, the main idea of our R&D project was to:

  • Leverage public clouds where more computational power could be allocated on demand immediately when the end-users needed to perform analytical processing

  • Use advanced containerization techniques with fine-grained resource control so that computational resources were shared between jobs strictly so that no job could hog system resources or cause other jobs to stall

 

It was essential to build the solution vendor-lock-free, making it possible to run on any public cloud or on-premises infrastructure. 

The Plexteq engineering team developed a cloud-native and cloud-agnostic turnkey service for executing genome processing jobs.
 

The developed solution allowed smooth integration into the customer’s existing infrastructure through the REST API, enabling a fully automated genome processing pipeline. The previous solution they used was semi-automated, which involved manual actions and was handled by resource-bound in-lab equipment.  

 

The solution includes several modules:

 

  • Cloud scheduler – Responsible for allocating and recycling cloud resources, and ensuring that cloud costs are not exceeding the daily/weekly/monthly thresholds

  • Cloud controller – Ensures that genome processing jobs are executed reliably and in the proper order

  • Node controller – Starts/stops jobs per requests of the cloud controller, allocates system resources for job execution, retries the job if it fails, and controls the execution process

In addition to the REST API, we also developed a rich web interface that allowed our client to observe jobs, system resources, and various system events, and manage the execution at the runtime.

Key features

Cloud-agnostic service with broad public cloud support

security2.png

HIPAA compliance

Cost-based predictor and wordload optimizer

Industries: Biotech, Healthcare
Expertise: Big Data, Cloud Services 
Market: Global
Team size: 8 engineers
Cooperation: 2016 – 2020

Technologies:
Java, Docker, LXC, Bash, Tomcat, PostgreSQL, Azure, Amazon, Google Cloud

Screenshot 2021-08-16 at 10.59.30.png

Overall, the developed solution allowed our customer to:

  • Execute jobs in public clouds and on-premises environments

  • Schedule and dispatch jobs through a REST API and manually via a web interface

  • Manage job execution order for complex processing pipelines

  • Manage job priorities to process tasks from paying customers faster

  • Manage and monitor cloud resource utilization

  • Plan and manage cloud expenses in a predictable way

  • Adjust computing resources for every job (CPU cores and RAM)

  • Report errors if processing fails

  • Automatically recover from soft failures and restart processing without manual intervention

The developed solution also enabled HIPAA compliance for the most business-critical component.

Business outcome

It's difficult to overstate the importance of speed in business. With the pace at which society progresses, companies have to do whatever it takes to stay relevant.

 

The platform Plexteq developed unlocks the power of big data in genomics and bioinformatics. Our product handles thousands of concurrent genome processing pipelines, delivering results exceptionally fast to end users and research labs. 

 

Key results achieved are:

  1. High-speed fully automated genome processing pipelines

  2. HIPAA compliance 

  3. Major user experience improvement – users started getting results around 10 times faster

  4. Built a cost control and alerting system to monitor cloud computing expenses in real time

 

With this solution, Plexteq is ready to address our clients’ needs to receive value from big data related to bioinformatics and genomics. Our team of professional big data engineers is in a strong position to implement genome processing platforms with custom functionality tailored for specific business needs.

Let Us Discuss How Our Team Can Contribute To Your Success