New Year, New Data Science Initiative… time to hire a Data Scientist?

  • #DataScience
  • #DataEngineer
  • #DataScientist

Article published by Sam Strong, Director West Coast at Keyrus

New Data Science Initiative hire aData Scientist

As we finish 2019, I wanted to share some of my thoughts for companies looking to start data science initiatives in the New Year. Disclaimer: This post is my opinion and a generalization based on what I see, working in the data analytics industry. As such, it might or might not reflect what you see in your particular sector or area.

Data Science, Machine Learning and AI have become front of mind in the business community as ways to create a competitive advantage. We are all familiar with applications of these in our day to day lives – from Netflix and Amazon recommendation engines, to estimating market trends in Zillow, or setting recommended rental values on Airbnb. As consumers, we are constantly seeing the results of data science in action. With this growing level of consumer interaction, it doesn’t take much imagination for a smart businessperson to start coming up with ideas of how Data Science could be applied.

Great – so we have an idea about what data science is and how it can be applied to our business – now time to hire data scientists?!

In a competitive market, hiring a data scientist with a proven track record with a leading tech company (like Netflix or Amazon) is going to be big investment and take at least a few months. As such, this is not exactly the quick proof of value you were looking for and only an option for companies with deep pockets.

Universities are doing a good job of educating students in data science and statistics related subjects, but academia is a long way from the real world. Generally, these students are coming to the workplace with the ability to build and refine machine learning models using Python or R. Unfortunately, you’ll quickly find that writing the R or Python for an ML model is only a small part of the process (and with the right tech can be almost fully automated). The reality is that they might not even get as far as building a model – because to build a model, you need data.

In most organizations, all the data isn’t in one nice, clean data warehouse – it’s everywhere. Data Science often requires access to many different types of data. Bringing data together to that data scientists can start to build and train models can be a massive task. ETL/ELT and Data modelling isn’t what most data scientists are trained to do and can be particularly hard in the type of locked down, production IT environment inside many large organizations.

Ok – so should we also hire data engineers or wait for IT to finish the data warehouse project they started in 2016?!

Good data engineers are also hard to recruit and this will take more time… But let’s imagine we do that too – now our data engineers help consolidate the data, the data scientists build an ML model and now we have to deploy it. To use a model, built in R or Python for a production process (like a recommendation engine API) usually requires the model to be re-coded into Java or C and deployed by an IT team. This process itself can take months, and because this is your first Data Science initiative, your IT team probably doesn’t know where to start. Once it’s deployed as an API end point, it still needs to be integrated into an application before it will be of any use to the business users or external customers who asked for it in the first place.

Now we are getting to the end of 2020 and planning the budget for 2021 – the data science initiative has cost more than expected and has little to show for all the work…. There must be a better way!

If you want to start a data science initiative, don’t start by hiring a data scientist!

To execute on a successful data science initiative, use a well thought out, multi-phase approach focused on delivering value to the end consumer (internal or external) and mitigate risk with a quick, low-cost pilot.
If you’re data savvy and a bit technical with some time to spare, you can lead this yourself. If that’s not you, the Keyrus data science team can run this pilot for you in just 2 weeks. Either way, the following approach will provide a solid starting point, for less than the cost of recruiting one data scientist:

Step 1)
It’s important to start with a small pilot or proof of value focused around a particular use case. To find the initial use case, a small amount of discovery should take place for the front-running ideas. This should include:
. Getting an understanding of the business scenario
. How to access the appropriate data sets
. How to measure the business value of a successful implementation
. How end users will interact with the results of the model

When answering these questions this time around, this about the simplest way to achieve the desired outcome (e.g. If we are creating a model to identify accounts that could churn, it’s probably OK for the outcome of the pilot to be an excel with a list of accounts for one sales rep rather than building a prediction into the CRM – even if that is how we’ll ultimately want to scale it to the whole organization).
Choose to move forward with the simplest idea that has the potential to provide at least $1m of value if fully implemented.

Step 2)
Now you’ve chosen which idea to pilot, it’s time to try it out! Rather than trying to learn R or Python, use a trial of self-service software (e.g. Alteryx) for quick and easy way to join data from different sources and run some pre-build ML models to make your first predictions.

Step 3)
Once you have an output – test it with a small group of the users and measure the results. Even if the model isn’t perfect or well refined, we can record any improvements on current performance and establish the value of investing more time.

After stage 3, we can evaluate if this really is a project that should move towards production and make an estimate of the potential returns to the organization.  We will also have a better idea of the costs involved in moving it into production and the skill sets required. With this information, you can make a data-driven case to management for building a team and allocating significant budget to data science initiatives.