- Embracing Public Cloud for Business Operations
- Security and Compliance
- Hybrid Cloud
- Application Hosting
- AI and Machine Learning in Public Cloud
- Cloud Solutions Using OPEX Model
Hentsu has taken the time to carefully construct this Azure Data Factory case study, to highlight the benefits of both the cloud and MS ADF. A client recently approached us with a data science challenge regarding one of their data sets. The data was provided to the client in an AWS environment in a Redshift data warehouse. While this was fast they found it to be very expensive, in AWS the data and compute costs are coupled together. As such, a large data set necessitates a high spend on computing costs, even if this level of speed is not necessary for their analysts.
However, the data was also available in CSV format in an S3 storage bucket, which could be the starting point of a new approach. The client already had all their infrastructure deployed and managed by Hentsū in Azure, so they wanted to consolidate into the existing infrastructure.
After reviewing the challenges, we were able to create an elegant solution leveraging the huge power and scale of the cloud, which is simply not possible in traditional infrastructure.
[/et_pb_text][/et_pb_column][et_pb_column type="2_5" module_class="ds-vertical-align" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_text _builder_version="4.1" background_color="#ffbb22" custom_margin="0px|||0px|false|false" custom_padding="20px|20px|15px|20px|false|false"]Hentsū recommended a solution built on Azure Data Factory (ADF), Microsoft's Extract-Transform-Load (ETL) solution for Azure. While there are many ETL solutions that can run on any infrastructure, this is very much a native Azure service. It also easily ties into the other services Microsoft offers.
The key functionality is the ability to define the pipelines to move the data in a web user interface, set the schedules which can either be event based (such as a creation of a new file) or on a time schedule. After that, Azure handles the execution of the pipelines to process the data. The pipeline creation requires relatively little coding experience. In other words, makes it easy to delegate this to staff with little technical experience.
[/et_pb_text][/et_pb_column][et_pb_column type="2_5" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_text _builder_version="4.1" background_color="#ffbb22" custom_padding="20px|20px|10px|20px|false|false"]
In this particular azure data factory case study, Hentsū built out the data pipelines to move the data from AWS into Azure. The initial load was triggered manually, but then the update schedules were set to check for new files at regular intervals.
Hentsū created status tables to keep track of each file. This allows us to keep track of the state of the data as it passes through the pipelines and use a decoupled structure so that any troubleshooting or manual intervention can happen at any stage of the process without creating dependencies. The decoupled structure meant that individual files and steps can be fixed in isolation. Following that, the rest of the pipelines and steps continue uninterrupted. The clean decoupling means any errors on a particular step were easily identified and notified to users for investigation.
All the data was then mapped back to these tables, to be used if we ever needed to do further processing or cleaning on the final tables. The data was further transformed with additional schema changes to match the client's end use and to map it to the traditional trading data.
The pipelines were deliberately abstracted to allow for the least amount of work to add new data sources in the future. The goal was to make it easy for the client's end users to do themselves as and when required.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" admin_label="Benefits " _builder_version="4.1"][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text admin_label="Benefits & Caveats" _builder_version="4.1" custom_padding="50px||||false|false"]ADF can run completely within Azure as a native serverless solution. This means there is no need to worry about where the pipelines are run, what instance types to choose upfront, manage any servers/operating systems, configure networking, and so on. The definitions and schedules are simply set up and then the execution is handled.
Running as a serverless solution means true "utility computing", which is the entire premise of cloud platforms such as Azure, AWS, and Google. The client only pays for what is used, there are no times with idle servers costing money without producing anything, and it can scale up as needed.
ADF also allows the use of parallelism while keeping your costs to only what is used. This scaling up was a huge benefit of ADF for the client and when time is of the essence; one server for 100 hours or 100 servers for one hour cost the same, but the work is done in 1/100th of the time. Hentsū tuned the solution so the speed of the initial load was only restricted by the power of the database, allowing the client to balance the trade-off between speed and cost.
ADF has some programming functionality, such as loops, waits, and parameters for the whole pipeline. Although there is not as much flexibility as a full language (Python for example) it allowed Hentsū significant flexibility to design the workflows.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" admin_label="Caveats" _builder_version="4.1"][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" hover_enabled="0"]
There are limited sources and sinks (i.e. inputs and outputs). The full list is available in the Microsoft documentation. Microsoft's goal with ADF is to get data into Azure products, so if one needs to move data into another cloud provider a different solution is needed.
The pipelines are written in their own proprietary "language." This means the pipelines code does not integrate well with anything else, which would not be the case if they were written in a language like Python, as many other ETL tools will provide. This is also the key reason we have developed our own ETL platform for more complex solutions which uses Docker and more portable Python code.
There were some usability issues when creating the pipelines, with confusing UI or vague errors on occasion; however, these were not showstoppers. Our advice when using the ADF UI is to make small changes and save often. We can see that Microsoft is already aggressively addressing some of the issues we encountered.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" admin_label="Impact" _builder_version="4.1" background_color="#333333" custom_margin="||30px||false|false" custom_padding="||20px||false|false"][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" custom_margin="||0px||false|false" custom_padding="20px||30px||false|false"]
The client was very pleased with the ADF and Azure SQL Data Warehouse solution. This Azure Data Factory case study brought an elegant solution. The solution automatically scales the compute power to process the data as it changes week by week. It also scales up when there is more data, and scales down with less data. Overall, the solution costs a fraction of what it did previously whilst keeping it all within the client's Azure environment.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" _builder_version="4.1" collapsed="on"][et_pb_row _builder_version="3.25"][et_pb_column type="4_4" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_cta title="Reach Out To Find Out How We Can Support Your Data Science Needs" button_url="https://hentsuprod.wpengine.com/contact" button_text="Contact Us" _builder_version="3.17.6"] [/et_pb_cta][/et_pb_column][/et_pb_row][/et_pb_section]Microsoft recently had a flurry of announcements about Office 365 and especially Microsoft Teams. Below, we highlight some of the key changes important to the asset management space.
Office 365 can now set up policies that block users from downloading files from Outlook on the web to non-compliant devices. This helps provide more flexibility on the go, but still retains a good degree of security around your company files.
Azure AD Password Protection helps you eliminate easily guessed passwords from your environment, which can dramatically lower the risk of being compromised by a password spray attack. Specifically, these features let you:
To ensure clients have access to critical audit data to investigate security or regulatory incidents in their tenancy when required, the Exchange Online service introduces a configuration that automatically enables mailbox auditing on all applicable mailboxes to users of the Commercial service. With this update, it is no longer required to configure the per-mailbox audit setting for the service to begin storing security audit data. These actions are of high interest to understand the activities that are taking place within the tenant.
Microsoft released a preview of a new user experience that allows users to register security info for multi-factor authentication (MFA) and password reset in a single experience. Now when a user registers security info such as their phone number for receiving verification codes, that number can also be used for resetting a password. Likewise, users can change or delete their security info from a single page, making it easier to keep information up-to-date.
Meeting organizers have the option to prevent attendees from forwarding a meeting invitation. This option is available only for users in Office 365. In the first release, the option to prevent forwarding is available when creating or editing meetings in Outlook on the web, but the option will become available in Outlook for Windows shortly after.
Admins can specify TeamSite Libraries that they want their users to automatically sync with OneDrive for Business.
Microsoft Authenticator mobile app now supports sign-in with your face/fingerprint or device PIN to your work accounts. You can take out the security risk of passwords and have the convenience of using a device you already own and carry with you. This option can be configured by administrators in the Azure Active Directory.
For more Information on the latest Microsoft updates check out the roadmap here.
[/et_pb_text][/et_pb_column][/et_pb_row][et_pb_row _builder_version="3.25"][et_pb_column type="4_4" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_cta title="Contact Us" button_url="https://hentsuprod.wpengine.com/contact" button_text="Click Here" _builder_version="3.17.6"]
To learn more about how we can support you with these updates and more, contact us today.
[/et_pb_cta][/et_pb_column][/et_pb_row][/et_pb_section]The public cloud market has grown and changed over the years. With each step of the way, Hentsū continues to accumulate experience and knowledge on how to adapt and stay agile.
In this interview, we cover more than just the public cloud and how it's shaping the current market, but also cloud trends, the progress of serverless computing, growth of data, an increase of quant workloads, and more. Companies are fighting to stay afloat by disregarding traditional services such as delivering servers and infrastructure. This makes room for new solutions, fresh tech, more agile services to deliver business value directly to our clients.
[/et_pb_text][/et_pb_column][/et_pb_row][et_pb_row column_structure="1_2,1_2" _builder_version="4.1"][et_pb_column type="1_2" _builder_version="4.1"][et_pb_image src="https://3bb4f13skpx244ooia2hci0q-wpengine.netdna-ssl.com/wp-content/uploads/2020/11/cloud-drivers-01.jpg" show_in_lightbox="on" _builder_version="4.1"][/et_pb_image][/et_pb_column][et_pb_column type="1_2" _builder_version="4.1"][et_pb_text _builder_version="4.1"]With a clear goal and a solid strategy, companies get closer and closer to their business targets. All of this is enabled quickly thanks to the amazing potential of public cloud and the tooling that comes with it.
[/et_pb_text][/et_pb_column][/et_pb_row][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" custom_margin="50px||||false|false"]With public cloud computing, you consume as you need, opposed to buying upfront. Marko Djukic, HentsūThat's not all. The key focus here is that everything is powered by code and everything is driven by code. Bearing that in mind, Hentsū always had the ability to adapt and stay focused in today's business world.[/et_pb_text][/et_pb_column][/et_pb_row][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" hover_enabled="0"]
Marko Djukic, CEO, and founder of Hentsū, reflects on the advantages of the public cloud in asset management, the enhanced security it delivers, and the evolution of data science it enables. Read the interview to learn more: The Power of the Public Cloud
In this case study we shall examine the uses and advantages of Docker architecture and the benefits of a Kubernetes cluster.
One of our existing clients had been using their own machine learning strategies to develop an in-house platform in order to produce trading signals from a range of alternative datasets. The 4-person development team had been running for six months, working on building a suite of Python applications and Big Data processing pipelines, both on premise and in Amazon AWS cloud.
The client approached Hentsū to extend their own small development team, and to improve the overall software development. The pace of functionality releases was slow, the applications were suffering from complexity and the code quality was poor.
The in-house developers were experiencing significant struggles to work as a unified team. Code was being committed and deployed with broken library dependencies, resulting in manual fixes every release to ensure code was running correctly. The applications were disjointed and inconsistent, with very loosely coupled sets of scripts, software, and services. There was no robust deployment of the applications, and once deployed there was often the need to intervene manually.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" _builder_version="4.1" background_color="#333333"][et_pb_row _builder_version="4.1" width="90%"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" ol_font="|600|||||||" ol_text_color="#ffbb22" custom_padding="20px||||false|false"]Hentsū promptly identified the need to deploy the most efficient Continuous Integration/Continuous Deployment (CI/CD) pipeline, as fast as possible. The focus had to be on feature development rather than tooling, as well as on the removal of any difficulties in getting great code from the developers - quickly.
As a first phase, Hentsū deployed a development workflow, which was based around the Atlassian suite of products. The goal was to enable rapid iteration of the team’s code, whilst ensuring overall software testing, quality control and integration. The workflow relied on properly defined environments – Development, Testing, Acceptance, Production (DTAP).
[/et_pb_text][/et_pb_column][/et_pb_row][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_image src="https://hentsuprod.wpengine.com/wp-content/uploads/2018/06/DevOps-Workflow.jpg" show_in_lightbox="on" align_tablet="center" align_phone="" align_last_edited="on|desktop" admin_label="DevOps Workflow" _builder_version="3.23"][/et_pb_image][et_pb_text _builder_version="4.1"]
Deploying these steps and enforcing the code flow produced an instant improvement in both the collaboration across the team, and the quality of software. There was much improved visibility on what any one developer was pushing into the branches and its effect on the overall software.
Separately, Hentsū worked with the developers to restructure the Git software repositories logically into specific areas of concern (apps/services/dependencies). Each repository would contain its own tests, dependency tree and Bitbucket pipeline YAML config. This enabled more autonomy in development, whilst retaining efficient control over the cross-platform dependencies and testing.
Finally, the agile methodology was improved through the use of clearer structure and scheduling. Ensuring better code quality and higher feature throughput was key. So, there was a focus on activities such as sprint start, standups, development time, smoke tests, and backlog refinement. Product ownership and feedback were improved within the business. This was accomplished by clearly identifying each feature owner and involving them in the sprint process.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" _builder_version="4.1" background_enable_color="off" collapsed="off"][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text _builder_version="4.1" hover_enabled="0" locked="off"]Hentsū deployed a 3-person team to augment the in-house developers. The team brought Python and containerisation expertise to re-architect the applications and make them more stable, self-contained, and easily distributed across environments. Whilst the Git repositories were restructured, the corresponding Docker images were rolled out for each specific service.
Docker registry and Elasticsearch services from AWS were used to help with the deployments and monitoring, without having to stand up infrastructure. To help with the deployment, scaling and management of the Docker containers, a Hentsū customised Kubernetes platform was rolled out. The customisation also allowed the client to overcome the limitations in the AWS EKS service and integrate VMware environments. This ensured consistency of deployment and tooling, but also allowed for the applications to be deployed to Azure and Google Cloud Platform (GCP).
If you wish to learn more about Docker, check out our blog posts:
[/et_pb_text][/et_pb_column][et_pb_column type="2_5" module_class="ds-vertical-align" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_text _builder_version="4.1" background_color="#ffbb22" custom_padding="10px|10px|10px|10px|false|false"]Using a Kubernetes cluster, Hentsū enabled the additional functionality of automatic scaling. Worker nodes were able to run as a static number, which could be useful on-premise to limit the impact on other resources. However, as the Python code had the capability to work in parallel, deploying autoscaling allowed the number of nodes to ramp up quickly based on the queue of work. If there was a bigger queue of incoming data to process, the entire cluster could autoscale to thousands of nodes if needed. Each individual worker node was a small enough unit of compute/memory that the autoscaling for different loads of work became very linear and cost efficient.
Combining the Hentsū Kubernetes cluster management and AWS meant that the client had many more options to manage the workloads. The cluster could rapidly adapt between specific GPU enabled worker instances, whilst simultaneously the client was able to use the AWS Spot market for cheaper resources when available and move the application between regions or even cloud providers. Another new possibility this opened up was deploying to bare metal, allowing for VMware to be discarded.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" _builder_version="4.1" collapsed="on"][et_pb_row _builder_version="3.25" background_size="initial" background_position="top_left" background_repeat="repeat"][et_pb_column type="4_4" _builder_version="3.25" custom_padding="|||" custom_padding__hover="|||"][et_pb_text admin_label="Solutions 3" _builder_version="4.1"]With the ability to run the Python code in various cloud platforms, and potentially also utilise Platform as a Service (PaaS) from the cloud providers, the security of the intellectual property was of concern. Hentsū deployed the entire solution with strict adherence to its own developed ISO 27001 cloud security checklist. Encryption was built into the application from the start, and all user access controls were tied back to the client’s corporate Active Directory.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" admin_label="Impact" _builder_version="4.1" background_color="#333333" custom_margin="||30px||false|false" custom_padding="||||false|false" collapsed="on"][et_pb_row _builder_version="4.1" width="90%"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_text admin_label="Impact" _builder_version="4.1" custom_padding="10px|10px|10px|10px|false|false"]The improvements and options Hentsū enabled meant developers were happier and substantially more productive with their coding. We've employed both Docker architecture and Kubernetes cluster successfully. Additionally, the team collaboration and the engagement of the business stakeholders meant that more features than initially planned were released to the end users, and in a faster timeframe.
The number of bugs which were raised in production from each two-week release cycle was reduced from an average of over 30, to below 2. This code quality success was ensured by the improvements Hentsū implemented to the scheduling and structure, such as the Pipeline unit tests, the consistency of the development and acceptance environments, and the rigorous smoke tests.
The greatest impact was the overall delivery of the project. When Hentsū was first engaged the estimate for remaining time to deliver the project was 18-24 months; however with the changes delivered by Hentsū the project was completed in under 6 months.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built="1" admin_label="CTA" _builder_version="4.1" collapsed="off"][et_pb_row _builder_version="4.1"][et_pb_column type="4_4" _builder_version="4.1"][et_pb_cta title="Talk to us about your cloud workloads" button_url="https://hentsuprod.wpengine.com/contact" url_new_window="on" button_text="Contact Us Today" _builder_version="3.16" button_text_size__hover_enabled="off" button_one_text_size__hover_enabled="off" button_two_text_size__hover_enabled="off" button_text_color__hover_enabled="off" button_one_text_color__hover_enabled="off" button_two_text_color__hover_enabled="off" button_border_width__hover_enabled="off" button_one_border_width__hover_enabled="off" button_two_border_width__hover_enabled="off" button_border_color__hover_enabled="off" button_one_border_color__hover_enabled="off" button_two_border_color__hover_enabled="off" button_border_radius__hover_enabled="off" button_one_border_radius__hover_enabled="off" button_two_border_radius__hover_enabled="off" button_letter_spacing__hover_enabled="off" button_one_letter_spacing__hover_enabled="off" button_two_letter_spacing__hover_enabled="off" button_bg_color__hover_enabled="off" button_one_bg_color__hover_enabled="off" button_two_bg_color__hover_enabled="off"]See how Hentsū can enable your data science workloads across multiple clouds, using DevOps techniques and Docker containerisation. [/et_pb_cta][/et_pb_column][/et_pb_row][/et_pb_section]