Data Science is all about understanding data, finding patterns and at the same time predict something out of it.
First step in data science is about getting hold of data, data can be available in different formats such as csv, excel, SQL tables, txt files etc.
Once we get data we can start exploring it and refining it as well. Excel is a very good tool to explore data which comes with lot of inbuilt functionalities.
Step two figure out any anomalies of data you have got and remove them such as
- Duplicate data
- Incorrect data such as most of the data ranges in 2 digits but suddenly you have a 3 digit or 4 digit data so you need to understand whether this is correct or should be removed
- Blank data, any blank data can be filled with relevant data like by taking an average or removing it.
Above list is not exhaustive but relevant, there can be other scenarios as well.
Once you have cleaned your data, third step is to use some methods to understand your data better
- You can use aggregation methods in excel like Sum, Count
- Conditional formatting using colors
Using these methods you might be able to identify some trends or understand it better than before.
Once you have understood it better and you are smart enough you can ask following types of questions which forms basics of data science
- Descriptive: Questions like how many?, how much ?
- Associative: Relational data questions, like if temperature is high you sell more ice creams or you sell more ice creams between 2 PM to 4 PM
- Comparative: Compare of flavor of ice creams being sold and compare them in a day or a month
- Predictive: You can predict data such as chocolate flavor ice cream is most liked by children
If you can ask above questions it means you understand data quite well. Next step is to plot data in different ways such as Graphs, Charts, Histograms and also visualize data in 2D and 3D (Pivot table and charts)
Some common techniques are listed below
- Charts based on X and Y Axis: Column chart, Line
- Plotting of data: Histogram, Scatter
- Pie chart
- Box and Whisker
- Pivot table and Chart
So these are some basics of data science, more in next blog
Data Science is huge subject which cannot be dealt in one blog, I have started a journey of acquiring a new skill “Machine Learning”, I am writing blog as I learn to keep notes for myself and for future learners. First day we learnt about basics of data science.