ID No. : 18IT003

Name : Janvi Hasmukhbhai Ajudiya

Subject : Data Science(IT-441)

Dataset : https://archive.ics.uci.edu/ml/machine-learning-databases/00426/

Task-1:

Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.

First convert csv_result-Autism-Adult-Data.arff file to csv_result-Autism-Adult-Data.csv file using online tool.

Converting .arff file to .csv file format

Open Orange tool and select file from side panel and double click on that file. Load data file in .csv file format as given below.

Loading .csv file and changing role of id from feature to meta

Set target variable as Class/ASD. Target variable can be set as per dataset. By looking its format and type whether it is continuous or not, we can decide target variable as well as classification.

Here, target variable = Class/ASD

Classification used are random forest and logistic regression.

Target variable is what we need to predict.

Setting Class/ASD as target

Now choose pre-processing, connect it to file and choose below options. One feature per value is for Encoder(One-hot), Normalize features for normalization, Input missing value for Missing value handling and Select Relevant feature for Feature Selection.

Setting pre-processing as above for Encoder, Normalization and missing value handling
Setting pre-processing as above for feature selection
Dataflow

Above diagram repsrents complete data flow.

Below two images represents Evaluation result with precision and recall with and without pre-processing.

Evaluation result of dataflow before preprocessing(Test and Score)
Evaluation result of dataflow after preprocessing(Test and Score)

As Classification is there, confusion matrix is used and below 4 images represents Confusion matrix of Random Forest and Logistic Regression along with pre-proceesing and without it.

Confusion matrix of Random Forest Classification before pre-processing
Confusion matrix of Logistic Regression before pre-processing
Confusion matrix of Logistic Regression after pre-processing
Confusion matrix of Random Forest Classification after pre-processing

Save data-table by clicking on save data and data should be saved in .xlsx format which we will use it later on in PowerBI.

Saving data table after pre-processing in .xlsx format

Here, you can see test and score result with and without pre-proceesing. That represents accuracy and when data is processed, the accuracy increases. Thus, pre-processing of data increases accuracy of prediction model.

Task-2:

Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Getting data for PowerBI as Excel Workbook format

Select on “Get Data” and select on Excel Workbook to load your data. You can use different format as per your requirement.

Loading data by selecting sheet1 and clicking on Load data

After that, select on Sheet1 and then load to load your data as given in image above.

Fields of data table after loading data

Only some fields will be loaded as we have applied fixed feature in pre-processing in above phase and loaded data over here. This applies feature selection, hence what you have choosen, only that data will be loaded.

Here, I have created 4 different types of graph to put them into dashboard by selecting x-axis and y-axis along with their values.

Stacked Column chart between Class/ASD and id
Clustered Column chart between Class/ASD and id
Pie chart of id and result
Doughnut chart of id and result

After generating individual reports, Select “Publish” option on the home screen of PowerBI and then log on to https://app.powerbi.com where you will find the workspace which was published. Combine all reports to single dashboard by pinning them. Now your dashboard is ready.

2x AWS Certified