Case study

Elastic Cloud Processing Engine for Genetic Data

A company that specializes in genome processing has hit a computing processing limit. They needed genome processing to be fully automated and put in a cloud.

We have built a scalable cloud-native and cloud-agnostic service for executing genome processing jobs

Business Challenge

Our client is an innovative genome processing startup and the world’s largest DNA testing and analysis platform. The company aims to provide end users with a platform to store, use, and understand their genetic data easily. The platform consists of components that securely upload genetic data files of any size directly into the user’s account, a genome processing engine, and various APIs for third-party integration. To accelerate the growth and handle an increasing number of users, our client was looking for an established digital health technology enabler to build a high-performance, elastic cloud system for batch genome processing that automatically scaled out depending on demand.

Requirements to the system were well defined and required the target system to control computational resources in a fine-grained way per each genome processing job to differentiate between various business aspects. For instance, jobs from premium clients should be given more resources and prioritized. Our client required a trusted technology partner to provide top-notch software engineering services in the cloud and distributed system domain.

The urgency of satisfying an increasing load resulted in tight implementation timeframes and the need for exceptional technical expertise to build a high-performance, resilient, and HIPAA-compliant data processing infrastructure, incorporating a series of digital solutions and services. As a result, our client turned to Plexteq as a proven health technology expert proficient in medical software development. After a small trial project that demonstrated our strong product development and delivery management capabilities, our client entered into an extensive, long-term strategic partnership with Plexteq.

Key Challenges

Existing genome job processing on in-lab equipment was slow and inefficient

The existing process was semi-automated required manual intervention

Existing genome processing environment was not HIPAA compatible

Solution Delivered

Genome processing is generally a computationally heavy process. Depending on the kind of processing, such a process may take many hours or even days of CPU time, memory, and I/O operations. Therefore, the main idea of our R&D project was to:

Leverage public clouds where more computational power could be allocated on demand immediately when the end-users needed to perform analytical processing
Use advanced containerization techniques with fine-grained resource control so that computational resources were shared between jobs strictly so that no job could hog system resources or cause other jobs to stall

It was essential to build the solution vendor-lock-free, making it possible to run on any public cloud or on-premises infrastructure.

The Plexteq engineering team developed a cloud-native and cloud-agnostic turnkey service for executing genome processing jobs.

The developed solution allowed smooth integration into the customer’s existing infrastructure through the REST API, enabling a fully automated genome processing pipeline. The previous solution they used was semi-automated, which involved manual actions and was handled by resource-bound in-lab equipment.

The solution includes several modules:

Cloud scheduler – Responsible for allocating and recycling cloud resources, and ensuring that cloud costs are not exceeding the daily/weekly/monthly thresholds
Cloud controller – Ensures that genome processing jobs are executed reliably and in the proper order
Node controller – Starts/stops jobs per requests of the cloud controller, allocates system resources for job execution, retries the job if it fails, and controls the execution process

In addition to the REST API, we also developed a rich web interface that allowed our client to observe jobs, system resources, and various system events, and manage the execution at the runtime.

Key Features

Cloud-agnostic service with broad public cloud support

HIPAA compliance

Cost-based predictor and wordload optimizer

Project Highlights

Industries: Biotech, Healthcare
Expertise: Big Data, Cloud Services
Market: Global
Team size: 8 engineers
Cooperation: 2016 – 2020

Technologies:
Java, Docker, LXC, Bash, Tomcat, PostgreSQL, Azure, Amazon, Google Cloud

Overall, the developed solution allowed our customer to:

Execute jobs in public clouds and on-premises environments
Schedule and dispatch jobs through a REST API and manually via a web interface
Manage job execution order for complex processing pipelines
Manage job priorities to process tasks from paying customers faster
Manage and monitor cloud resource utilization
Plan and manage cloud expenses in a predictable way
Adjust computing resources for every job (CPU cores and RAM)
Report errors if processing fails
Automatically recover from soft failures and restart processing without manual intervention

The developed solution also enabled HIPAA compliance for the most business-critical component.

Business Outcome

It's difficult to overstate the importance of speed in business. With the pace at which society progresses, companies have to do whatever it takes to stay relevant.

The platform Plexteq developed unlocks the power of big data in genomics and bioinformatics. Our product handles thousands of concurrent genome processing pipelines, delivering results exceptionally fast to end users and research labs.

Key results achieved are:

High-speed fully automated genome processing pipelines
HIPAA compliance
Major user experience improvement – users started getting results around 10 times faster
Built a cost control and alerting system to monitor cloud computing expenses in real time

With this solution, Plexteq is ready to address our clients’ needs to receive value from big data related to bioinformatics and genomics. Our team of professional big data engineers is in a strong position to implement genome processing platforms with custom functionality tailored for specific business needs.

Let Us Discuss How Our Team Can Contribute To Your Success

Case study

Elastic Cloud Processing Engine for Genetic Data

A company that specializes in genome processing has hit a computing processing limit. They needed genome processing to be fully automated and put in a cloud. We have built a scalable cloud-native and cloud-agnostic service for executing genome processing jobs

Key Challenges

​Solution Delivered

​

Genome processing is generally a computationally heavy process. Depending on the kind of processing, such a process may take many hours or even days of CPU time, memory, and I/O operations. Therefore, the main idea of our R&D project was to:

​

Leverage public clouds where more computational power could be allocated on demand immediately when the end-users needed to perform analytical processing

Use advanced containerization techniques with fine-grained resource control so that computational resources were shared between jobs strictly so that no job could hog system resources or cause other jobs to stall

It was essential to build the solution vendor-lock-free, making it possible to run on any public cloud or on-premises infrastructure.

​

The Plexteq engineering team developed a cloud-native and cloud-agnostic turnkey service for executing genome processing jobs.

The solution includes several modules:

Cloud scheduler – Responsible for allocating and recycling cloud resources, and ensuring that cloud costs are not exceeding the daily/weekly/monthly thresholds

Cloud controller – Ensures that genome processing jobs are executed reliably and in the proper order

Node controller – Starts/stops jobs per requests of the cloud controller, allocates system resources for job execution, retries the job if it fails, and controls the execution process

​

In addition to the REST API, we also developed a rich web interface that allowed our client to observe jobs, system resources, and various system events, and manage the execution at the runtime.

Key Features

Cloud-agnostic service with broad public cloud support

HIPAA compliance

Cost-based predictor and wordload optimizer

Project Highlights

Industries: Biotech, Healthcare Expertise: Big Data, Cloud Services Market: Global Team size: 8 engineers Cooperation: 2016 – 2020 Technologies: Java, Docker, LXC, Bash, Tomcat, PostgreSQL, Azure, Amazon, Google Cloud

Overall, the developed solution allowed our customer to:

​

Execute jobs in public clouds and on-premises environments

Schedule and dispatch jobs through a REST API and manually via a web interface

Manage job execution order for complex processing pipelines

Manage job priorities to process tasks from paying customers faster

Manage and monitor cloud resource utilization

Plan and manage cloud expenses in a predictable way

Adjust computing resources for every job (CPU cores and RAM)

Report errors if processing fails

Automatically recover from soft failures and restart processing without manual intervention

​

The developed solution also enabled HIPAA compliance for the most business-critical component.

Business Outcome

​

It's difficult to overstate the importance of speed in business. With the pace at which society progresses, companies have to do whatever it takes to stay relevant.

The platform Plexteq developed unlocks the power of big data in genomics and bioinformatics. Our product handles thousands of concurrent genome processing pipelines, delivering results exceptionally fast to end users and research labs.

Key results achieved are:

​

High-speed fully automated genome processing pipelines

HIPAA compliance

Major user experience improvement – users started getting results around 10 times faster

Built a cost control and alerting system to monitor cloud computing expenses in real time

Let Us Discuss How Our Team Can Contribute To Your Success

- Ahtri tn 12, Tallinn, Estonia - 18 Yunosti ave., Vinnytsia, Ukraine - 275 New North Road, London, England

info@plexteq.com

+372 6 10 42 43 +380 67 395 35 34

A company that specializes in genome processing has hit a computing processing limit. They needed genome processing to be fully automated and put in a cloud.

We have built a scalable cloud-native and cloud-agnostic service for executing genome processing jobs

Solution Delivered

Industries: Biotech, Healthcare
Expertise: Big Data, Cloud Services
Market: Global
Team size: 8 engineers
Cooperation: 2016 – 2020

Technologies:
Java, Docker, LXC, Bash, Tomcat, PostgreSQL, Azure, Amazon, Google Cloud

- Ahtri tn 12, Tallinn, Estonia
- 18 Yunosti ave., Vinnytsia, Ukraine
- 275 New North Road, London, England

+372 6 10 42 43
+380 67 395 35 34