what is a data scientist?
What is Data Science and a Data Scientist?
Hi, this is Sami Zain, in this article I’m going to talk about the data scientist. So what is a data scientist? A data scientist is a professional responsible for collecting, analyzing and interpreting large amounts of data to identify ways to help a business improve operations and gain a competitive edge over rivals. The area of study which involves extracting knowledge from data is called Data Science and people practicing in this field are called as Data Scientists.
Data scientists are a new breed of analytical data expert who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved. They’re the part of the mathematician, part of the computer scientist, and part trend-spotter. And, because they straddle both the business and IT worlds, they’re highly sought-after and well-paid. Who wouldn’t want to be one?
They’re also a sign of the times. Data scientists weren’t on many types of the radar a decade ago, but their sudden popularity reflects how businesses now think about big data. That unwieldy mass of unstructured information can no longer be ignored and forgotten. It’s a virtual gold mine that helps boost revenue – as long as there’s someone who digs in and unearths business insights that no one thought to look for before. Enter the data scientist.
The data scientist role is an offshoot of the statistician role that includes the use of advanced analytics technologies, including machine learning and predictive modeling, to provide insights beyond statistical analysis. The demand for data science skills has grown significantly in recent years as companies look to glean useful information from the voluminous amounts of structured, unstructured and semi-structured data that a large enterprise produces and collects -- collectively referred to as big data.
Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics, and programming to clean, manage and organize them. Then they apply all their analytic powers – industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover hidden solutions to business challenges.
Every business generates a huge amount of data which includes transactional, inventory, sales, marketing, customer, external and many other dimensions. This data has a humongous value stored in it, but they are latent. There are a lot of trends/patterns which need to be mined so that business can use it. If these relationships or patterns can be recognized and made insightful it adds huge value for the business.
This is where a data scientist takes his role.
This is where a data scientist takes his role.
Now the process of extracting information/meaning or value from data is a mammoth task. It requires a variety of skill sets which are interdisciplinary in nature. Taking a deeper look - we will first have loads of data accumulated across various sources; they may be structured, unstructured or semi-structured. Amalgamating all these data sources coherently into a proper dataset on which analysis can be performed requires a lot of technology skills. Technologies like Java/Hadoop/Map Reduce/SQL/Hive/Pig/ Python etc are used widely and yet are not exhaustive.
Next analyzing the data requires Math/statistical skills combined technology. You perform exploratory data analysis, perform various statistical tests, build regression models etc to get more information from the data. Here you need skill sets like Machine Learning, Deep Learning, statistical inference etc. Finally adding this to business requires a combination of Business, design thinking, problem-solving, behavioral science and a variety of other skills for the resource. A data scientist uses large amounts of data to develop hypotheses, make inferences and hone in on a customer, business and market trends. The data scientist must be able to communicate how to use analytics data to drive business decisions that may include changing course, improving a process or product, or creating new services or products. With the latter, the data scientist is involved in the development process. In the case of software, for example, the data scientist's role involves using data analytics to prescribe new features. Data scientists also set best practices for data collection, use of analytics technology and data interpretation.
Next analyzing the data requires Math/statistical skills combined technology. You perform exploratory data analysis, perform various statistical tests, build regression models etc to get more information from the data. Here you need skill sets like Machine Learning, Deep Learning, statistical inference etc. Finally adding this to business requires a combination of Business, design thinking, problem-solving, behavioral science and a variety of other skills for the resource. A data scientist uses large amounts of data to develop hypotheses, make inferences and hone in on a customer, business and market trends. The data scientist must be able to communicate how to use analytics data to drive business decisions that may include changing course, improving a process or product, or creating new services or products. With the latter, the data scientist is involved in the development process. In the case of software, for example, the data scientist's role involves using data analytics to prescribe new features. Data scientists also set best practices for data collection, use of analytics technology and data interpretation.
Soft skills required for data scientists include intellectual curiosity combined with skepticism and intuition, along with creativity. Interpersonal skills are also a critical part of the role, and many employers want their data scientists to be data storytellers who know how to present data insights to people at all levels of an organization. They also need leadership skills to steer data-driven decision-making processes in an organization. The education requirements for data scientists typically include a bachelor's degree in statistics, data science, computer science or mathematics. Hard skills required for the job include data mining, machine learning and the ability to integrate structured and unstructured data. Experience with statistical research techniques, such as modeling, clustering, and segmentation, is also often necessary.
Data science requires knowledge of a number of big data platforms and tools, including Hadoop, Pig, Hive, Spark and MapReduce, and programming languages that include structured query language (SQL), Python, Scala, and Perl, as well as statistical computing languages such as R.
Data scientists are responsible for those tasks and many more. These professionals are equipped to analyze big data using advanced analytics tools and are expected to have the research background to develop new algorithms for specific problems. They may also be tasked with exploring data without a specific problem to solve. In that scenario, they need to understand the data and the business well enough to formulate questions and deliver insights back to business executives with the goal of improving business operations, products, services or customer relations. Those additional responsibilities amount to a salary more than double that of a data analyst; Glassdoor.com puts the U.S. average data scientist salary at $113,436, as of May 2017.
So on a closing note, we can say "A Data Scientist is a person who possesses most of the above mentioned Math + Technology + Business skill sets and extracts valuable information from the data for business.
This was very helpful and insightful. Easy to read and understand.
ReplyDeleteBioinformatics analysis and identification of potential genes related to pathogenesis
Data Analysis Services