Data Science: What, Who, Why, When, How?
What is Data Science?
Data Science is the study of data. If this is the only thing that I write here, this article is still 100% complete. Anyways, let us go more than 100%. Shall we?
We live in the information age and have access to staggering amounts of information at the tip of our fingers as quickly as never before, heaps upon heaps of data, structured or unstructured. This is where data science comes into play.
Data science in other words is the analysis of data, finding insights from it, exploring and coming to conclusions concerning the data at hand. Data science helps organizations understand their environments, analyze existing issues and review hidden opportunities. It is a very important part of organizations and its application has grown over the years, exponentially. With the help of the internet, vast volume of data is available and data science uses tools and scientific techniques to find unseen patterns and derive meaningful information using complex machine learning algorithms.
Who is a Data Scientist?
A data scientist is one that deals with a mass of structured/unstructured data using skills in mathematics, programming, statistics to visualize data, build prediction models and help in decision making processes.
A data scientist is a good story teller. What is the use of having a mountain of data that has been processed but is still useless to you and I? A data scientist has to relay the information got from the dataset in a simple, effective manner to the appropriate stakeholders. Curiosity and sense of humor, (it is not a typo) are synonymous with data scientists. The curiosity starts from asking the right questions and investigation. A data scientist without a sense of humor is counter intuitive. Sense of humor helps in creative thinking, it helps to play with ideas and see situations in diverse ways. So basically, it encourages creativity and innovation.
A data scientist asks questions, good questions to have good understanding using the correct set of variables, checking if there is correlation between the variables, processing, analysis, interpretation and effective communication of the output.
For example, in a dataset that contains the 1st semester grades of students, the hours of study weekly, sex, age, failures and number of siblings. A data scientist would ask questions like if there is any influence or correlation of the age or sex of the student in question with their first semester results.
Why Data Science?
Harvard Business Review called data science the sexiest job in the 21st century and rightly so. The average data scientist salary is $100,560, according to the U.S. Bureau of Labour Statistics. There are conflicting figures for the Nigerian data scientist, the salary ranges from #250,000 to as high as 3 million naira depending on which source was cited.
According to Glassdoor and Forbes, demand for data scientists would increase by 28 percent by 2026. This shows that if you want a secure and stable career, it is in your best interest to position yourself for the limitless opportunity this field offers. Organizations today cannot survive without bytes and megabytes (zettabytes actually) of data available. There were approximately 44 zettabytes of data in the world as of 2020 and according to reports that number would quadruple by 2025.
As previously stated, this is the information age and who best to navigate this age than someone who deals with information and data? For example, in the iron age blacksmiths were the real deal, they were in high demand for very obvious reasons, they (the blacksmiths) had the appropriate skills to provide value in their time and were rewarded highly, respected and valued then. Little wonder, the hype surrounding data science.
Where is data science used?
Due to the all encompassing nature of data in the world we live in today, it might sound cliché (it actually does), but you could work in any industry of your choice. Just have the passion to solve problems there and you are good to go.
Data has become centric to everything we do in today’s world thanks to the advent and advancement of the internet. Every industry you can think of uses data. From agriculture, entertainment, sports, medicine, they all integrate data science in one aspect or the other.
Remember the example of a students dataset with their 1st semester grades, the hours of study weekly, sex, age, failures and number of siblings. The 2nd semester grades can be predicted using those variables. More appropriate variables like grades in previous years, hours spent using social media can be added to the dataset to make it more accurate.
For instance, Netflix uses data science to recommend movies to you based on various factors like age, location, previous movies watched. Some of the variables have more impact than others. This is an example of a recommendation model, other examples are dating sites, music streaming services.
Here is a list of various industries where data science is applied,
· Healthcare
· Recommendation systems
· Banking and finance systems
· Education
· Transport
· Sports
· Image and speech recognition
· Weather forecasting
This list might be inexhaustible, so I’d stop here
How do you become a data scientist?
There are different effective ways to become a successful data scientist, the only ‘wrong’ approach is to try to become like a particular successful data scientist, using just the blueprints of the individual in question. What should be done in that instance is to modify the blueprints and see what works for you. You should not just follow and try to become a lite version of the person, you are unique in your ways and you know best what works for you.
Know what time works best for you, night or day? Do you have access to constant electricity supply and if no, what time does it come. These are the some of the factors to consider apart from the characteristics of who a data scientist is as I have previously mentioned.
As a complete beginner, you should have a decent knowledge of statistics and programming. YouTube is a great resource to learn for free the downside is that the content there is unstructured and you could easily get overwhelmed in the heap of tutorials there. Another way is by taking courses via online learning platforms like Udemy, Coursera for a fee, there are also textbooks that explain the concepts if you prefer to read but the best way to learn is by actually building things, this point cannot be overemphasized.
There is a saying, ‘in doing, we become’. So keep ‘doing’ data science, then you become it.