Solutions

No more messy data

Data pollution is expensive! Most organisations state, that an estimated 25% of their data in inaccurate, a factor that strongly influences the business result. This is best explained in the 1-10-100 rule: If it costs you 1 euro to prevent a data mistake, it will cost you 10 euro to fix it afterwards- and 100 euro is a problem is created by this mistake.

Our approach ensures that companies prevent data pollution at the source, the data pipeline.

**We build automated tests in your data pipeline to prevent any dirty data effecting your business. **

In roughly 2 weeks’ time we will, with the support of your domain specialist:

Build in an automated data test framework within your existing data pipeline;
Program one data pipeline completely with automated tests;
Support you in workshops to provide you with knowledge you need, for you to (test) automate the rest of your data pipelines yourself.

Clean code for data scientists

Data scientist operate more and more in a world that goes beyond their local “island” of RStudio or Jupyter notebooks. There is a growing need to integrate trained models into a production environment. This results in the code starting an extended life outside of a notebook, where it will be maintained and changed by other data scientists.

In this transition, the code is getting new requirements. With an initial goal to create the most accurate machine learning model, we now see that in most cases the code can’t be easily integrated in a production environment or can’t be maintained by others. Data- or DevOps-engineers are supporting in the cases, but these disciplines don’t always speak the same language and might have a different approach on how software needs to be developed.

In this hands-on workshop specially created for data scientists, we will show you the world of clean coding and discuss problems we have seen with moving machine learning models into production. We will be doing so by showing you a series of short refactoring kata’s, always starting with a theoretical part. As a result of this workshop, the transition from local data science code to a solid production product will be easier and will result in gaining business value faster.

Spot on - Reduce cloud data processing cost

Complex data calculations in the cloud are expensive and based on subsequent calculations. We can show you how to smartly use cheap temporary resources (so-called Spots) to save cost.

Per machine-hour costs can be reduced to up to 90%. This will create a significant reduction in €/output for heavy data workloads in the cloud. Our experts will help you mapping out your cloud data processing cost, reduce these and make it manageable.

Continuous Delivery for Machine Learning (CD4ML)

Your model and data in production from day 1 – continuously and always up to date.

Continuous Delivery – The short cyclical and reliable roll-out of new software in a production environment – has revolutionised software engineering. We can help you to apply this smart principle to your data and models. This will give you instant feedback the fastest ROI on your smart applications.

Quick wins

No more messy data

Clean code for data scientists

Spot on - Reduce cloud data processing cost

Continuous Delivery for Machine Learning (CD4ML)

Data consulting

Consulting

Audit & Second Opinion

Data Workshops

Become a Microsoft Certified Azure expert

Intro to Data Science

Solution Architecture Workshop