By Alex Woodie
Data science jobs are among the highest paying jobs around the world, thanks to the rapid pace of data creation and budding need to make sense of it all. Whether you’re a card-carrying data scientist or just a developer looking to round out your resume with new data science abilities, you can increase your marketability by knowing what types of skills employers are looking for in 2019.
The shortage of data scientists around the country has been well documented. The August 2018 LinkedIn Workforce Report found that there were more than 151,000 data scientist jobs going unfilled across the U.S., with “acute” shortages in New York City, San Francisco, and Los Angeles.
While data scientists aren’t the only ones who can tell us what’s in our data or build a machine learning, their particular skillsets – mathematical/statistical competency, expertise in distributed computing, and business acumen – make them the prized “unicorns” who can whip a big data analytics or AI project into shape.
In terms of skills shortages, big data and analytics is the number one place of need, according to the recent KPMG CIO Survey. Nearly half (46%) of CIOs who participated in the survey said they suffered from a skilled shortage, followed by a shortage in AI skills at 38%.
One way to analyze the demand for data science skills is to see what employers are paying them. Randstad US released its 2019 Salary Guide yesterday, which found that salaries for data scientists nationally ranged from nearly $85,000 to more than $119,000. Machine learning engineers – an emerging technical specialty that’s distinct from data scientists – earned from about $79,000 to nearly $133,000 nationally, the report found.
Skills shortages in big data, analytics, and AI are the top concerns for CIOs, according to a KPMG report
However, salaries for data scientists and machine learning engineers can be much higher in specific regions. For example, Randstad US’s 2018 Salary Guide found that entry-level data scientists in the San Francisco Bay Area were making an average of nearly $107,000, while senior-level data scientists exceeded $172,000 on average. Salaries for machine learning engineers, by contrast, ranged from about $98,000 to north of $165,000.
It’s not easy to build a team of professionals to leverage AI and big data analytics. According to a forthcoming report from O’Reilly Media, nearly 60% of respondents are either building their own data science platforms or evaluating pre-built data science tooling. The report, which will be released at the Strata Data Conference scheduled for March 25 to 28 in San Francisco, found that 44% of companies are looking for people with data science skills and 41% are on the hunt for data engineering talent.
“It is clear that in 2019 companies are planning to invest in implementing analytics, AI and automation tools,” said Ben Lorica, O’Reilly’s chief data scientist and chair of the Strata Data Conference. “However, in order to do so successfully, initial investments must be made in the foundational technologies and infrastructure needed to sustain success.”
The Skills, Please
So, what skills should a data scientist have in 2019? That’s a subjective question, of course, but we did our best to hash out some answers.
Python, which can used to write data science applications as well as general purpose applications, trailed only Kotlin and Go under the metric “language developers want to learn in 2019.” “Interestingly, developers’ interest in Scala has dropped, whereas their interest in TypeScript has increased,” HackerRank notes in its report. Scala was the third most popular language last year, but dropped to sixth in 2019.
Apache Spark, which is used for creating distributed data analytics applications, was the sixth most popular framework that developers want to learn in 2019, trailing general-purpose frameworks like React, AngularJS, Vue.js, Django, and Ruby on Rails. Nearly half of the respondents in O’Reilly’s survey say they have used Spark or Spark Streaming.
The widespread appeal of Spark is not surprising given the capabilities it gives developers, according to Seth Dobrin, the head of IBM‘s Data Science Elite Team. “Spark continues to be a real important tool and people need to know how to interact with Spark or use Spark to interact with different database and different compute components,” Dobrin tells Datanami.
What gives Spark its legs is its broad applicability and high performance, Dobrin says. “You can have one [Spark] interface for just about any database,” he says. “I can leverage Spark to access Hadoop or MS SQL or RedShift or Db2, and it does it in a highly performant manner.”
IBM yesterday announced a new data scientist certification program in conjunction with The Open Group. It also announced a new 24-month Data Science Apprenticeship program as part of the “new collar” program that CEO Gini Rometty launched several years ago.
Another way to ascertain data science skills is to see what types of people other companies are hiring. Diffbot, which claims that its Knowledge Graph “allows any business to treat the entire Web as a database for business intelligence,” looked at the types of folks that Facebook poaches from the other tech giants, including Microsoft, Google, Yahoo (now Oath), Amazon, and Apple.
Being able to manipulate data is another important skill that data scientists (or data science teams) should have. The capability to build data pipelines, manipulate ETL processes, and prep data for analysis remain some of the most time-consuming tasks for data scientists (although vendors are rapidly working to automate them).
Once the data is ready for analysis, the ability to wield algorithms effectively is another critical skill. Knowledge of classic machine learning algorithms like regressions, K-means, and SVMs (among others) are de rigueur for the modern data scientist. Increasingly, data scientist need to know how to put together a deep learning setup, in which case knowledge of frameworks like PyTorch and Tensorflow would be needed.
According to HackerRank, Internet of Things and deep learning will be the hottest development targets over the next two years, driven by investments in the healthcare sector and automotive (i.e. self-driving cars). The likelihood of other new technologies, like cloud machine learning and computer vision, achieving a critical mass of adoption are also high on the HackerRank list. The fact that Blockchain and quantum computing were considered mostly unrealistic technologies by 2020 actually lends HackerRank’s report greater credibility.
Data science is a vibrant field to be in at the moment. It carries a lot of dynamism in terms of the skillsets that folks need to be successful, as well as the impacts that good data science can have on an organization. Despite those facts, there’s a yawning gap in the number of data scientsits needed and those available, which is reflected in the generous compensation that today’s data scientsists can draw.
No matter which end of the data science spectrum you find yourself in, you have reason to be optimistic.
Read more here:: www.datanami.com/feed/Posted on: January 30, 2019