Practical_Exam_Work
ID No. : 18IT003
Name : Janvi Hasmukhbhai Ajudiya
Subject : Data Science(IT-441)
Dataset : https://archive.ics.uci.edu/ml/machine-learning-databases/00426/
Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection
Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.
Solution :
First convert csv_result-Autism-Adult-Data.arff file to csv_result-Autism-Adult-Data.csv file using online tool.
Open Orange tool and select file from side panel and double click on that file. Load data file in .csv file format as given below.
Set target variable as Class/ASD. Target variable can be set as per dataset. By looking its format and type whether it is continuous or not, we can decide target variable as well as classification.
Here, target variable = Class/ASD
Classification used are random forest and logistic regression.
Target variable is what we need to predict.
Now choose pre-processing, connect it to file and choose below options. One feature per value is for Encoder(One-hot), Normalize features for normalization, Input missing value for Missing value handling and Select Relevant feature for Feature Selection.
Above diagram repsrents complete data flow.
Below two images represents Evaluation result with precision and recall with and without pre-processing.
As Classification is there, confusion matrix is used and below 4 images represents Confusion matrix of Random Forest and Logistic Regression along with pre-proceesing and without it.
Save data-table by clicking on save data and data should be saved in .xlsx format which we will use it later on in PowerBI.
Here, you can see test and score result with and without pre-proceesing. That represents accuracy and when data is processed, the accuracy increases. Thus, pre-processing of data increases accuracy of prediction model.
Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.
Select on “Get Data” and select on Excel Workbook to load your data. You can use different format as per your requirement.
After that, select on Sheet1 and then load to load your data as given in image above.
Only some fields will be loaded as we have applied fixed feature in pre-processing in above phase and loaded data over here. This applies feature selection, hence what you have choosen, only that data will be loaded.
Here, I have created 4 different types of graph to put them into dashboard by selecting x-axis and y-axis along with their values.
After generating individual reports, Select “Publish” option on the home screen of PowerBI and then log on to https://app.powerbi.com where you will find the workspace which was published. Combine all reports to single dashboard by pinning them. Now your dashboard is ready.