On Wednesday the 2nd of November Hentsū hosted keen minds from around the financial technology sector for a talk about both the benefits and implementation of grid computing clusters. This talk aimed to give people a taste of what Grid Computing has to offer by:
Discussing how they operate
Providing worked examples of Google Datalab and MATLAB Distributed Computing toolkit
What is Grid Computing?
For the uninitiated, Grid Computing refers to making use of the shared power of a cluster of computers to process computationally intensive tasks that would otherwise bog down a single work station. Jobs can be submitted from one computer to the grid, which then processes the data and returns the output to the user. Importantly, multiple people can simultaneously make use of the same grid by intelligently managing how the computers in the grid allocate resources. This allows for significantly improved workflow for companies whose work is optimised for grid computing. Opening with a much needed refresher on public and private cloud, as well as defining key terminology, the talk flowed into an investigation of the challenges companies face when working with different data sets and a look at solutions currently available to companies. This investigation, backed up by the live demonstrations, left those who attended with a solid understanding of what Grid Computing can do for them.
Traditional tools and PaaS Offerings
There is a breadth of choices for grid computing and how to migrate workloads into cloud environments. We covered some of those, looking at traditional MATLAB setups to more extreme Platform as a Service (PaaS) environments from Google using their Bigquery and Datalab, and ran some live demos ripping though 2TB of full depth market data.
Key points to take away:
Horses for courses - What spec’d machine does your task work best in? Many cores in fewer machines, or many smaller cores on multiple machines? Consider how best to configure your cluster.
Reduced upfront costs - Unlike traditional Grid Computing clusters that require upfront purchasing of all the machines needed before they can actually be used, cloud solutions let you skip out on paying for hardware.
Flexibility - Publicly available solutions can allow you to quickly scale the size and power of your cluster to ensure that you can crunch the data in the time you need.
Reducing worker downtime - Having employees sitting around waiting for their code to execute is wasting time and money. By pushing tasks off individual computers and onto the cloud, the bottleneck is alleviated and workers can continue with work.
Full Platform as a Service (PaaS) - Allows some very dynamic and fast access to compute across vast data sets, but will usually require significant re-tooling for major hedge fund production environments. When implementing PaaS type grid computing this also involves a more holistic approach across people, processes and technology.
We've built up a wealth of in-house expertise running grid computing workloads across all three major public clouds - Amazon AWS, Microsoft Azure and Google Compute Engine. We can get you up and running quickly with pre-tested designs and architectures, greatly eroding the overall traditional pains and TCO for running grid computing. Get in touch:firstname.lastname@example.org, we'd love to hear about your grid computing challenges.