The rise of data science
This is the era of big data. Advances in microelectronics and computer technology make it easier to collect and store huge amounts of data that comes in with great velocity. These days we can easily gather information and measurements that only a few years ago were too expensive to obtain. Over the next decade, a key factor which will decide whether companies succeed or fall behind will be whether they are able to turn their data into insights and actions.
Data does not explain itself – contrary to what some people think, facts aren’t “self-demonstrating”. To make matters worse, with tons of data also come lots of noise which can easily bury a meaningful signal in a pile of irrelevant stuff. That’s where data scientists come in: they are the ones who can slice through all the fluff to extract meaning, useful insights, and make predictions to drive the company’s strategic decisions. These skills make data scientists highly valuable – no wonder there’s currently a very high demand for them across many different industries.
What is data science?
Data science is a multidisciplinary field that draws on many different subject areas and disciplines: statistics, computer science, research, business expertise, communication. A common misconception is to consider data science as just an application of machine learning techniques to business problems. True, machine learning (the study and construction of algorithms that can learn from and make predictions on data) is a core component of data science, but it doesn’t stop there. Data science is actually the management of a process to turn data into predictions and insights. Its rise is mostly due to the latest developments in computer technology, but most of its theoretical basis comes from statistics, and data science’s way of proceeding draws heavily on the methods of scientific research.
Becoming a data scientist
The field of data science is growing so fast that employers are faced with the challenge of finding people with previous experience. Demand far outstrips supply, creating many opportunities for people interested in data science. Therefore, it bears to ask the question, what does it take to become a data scientist? It takes time to master any one of the disciplines data science draws on, let alone all of them. This is the main reason why data science is usually a “second calling” – many data scientists have started out their professional careers as programmers, statisticians, business analysts or academic scientists. People in those professions have already mastered a number of the skills required for data science; after making the choice to transition into this new field, they add some more skills to their existing repertoire in order to become effective data scientists.
It’s (mostly) about the data
First and foremost, a data scientist must be willing to come to grips with the data. In any given project, data scientists will spend the majority of their time collecting, cleaning up and analyzing data. The “sexier” machine learning part comes in only after the data is thoroughly analyzed and understood. A good data scientist knows that proper data analysis and statistics are not optionals, neither can they be tacked on at the end of a project.
Rock it with science!
Data science projects start with a question: what is it we want to know or improve? The data science process attempts to go from data to an answer to that question, with potentially unexpected discoveries along the way. This process draws a lot from scientific research; therefore, a good data scientist should have knowledge of, or at least interest in, the methods of research.
Programming is a must
After collecting the data and understanding it, a data scientist wants to extract insights using machine learning algorithms to build predictive models. This requires some decent programming knowledge, although, to be fair data scientists can get away with knowing much less than a professional programmer or a systems engineer! Currently, the two programming languages that dominate the data science field are R and Python; other languages used are C++, Java, Matlab, SPSS, Scala. Given that there are plenty of libraries implementing machine learning algorithms for R or Python, it’s recommended to have a good knowledge of either of one of these two languages.
Walk the walk, and talk the talk
Communication skills are very important for a data scientist. Data scientists rarely work in isolation; they are usually responsible for devising the plan of action, deciding the methods and tools, and presenting results, but they need to communicate the information not only to the client who wants the data science results, but to the rest of the team, who may not know statistics or machine learning. Thus, data scientists must have a bit of an “evangelizing” spirit, plus communication skills that enable them to tell a “data story” to widely different audiences.