Commit d367596c authored by Sushant Mahajan's avatar Sushant Mahajan

added README, removed old output file

parent 4e8b12ca
-The first step was to analyze the data. I tried looking at decision tree classifiers for the data but the trees being too complex I could not draw any conclusions.
-I normalized the data as I found it gave better results. The normalization was zero mean and unit variance.
-Having found the variance for all features, I found that for many features the value of variance was very low. From this I deduced that these features, by not varying much were not contributing much to the prediction.
-After experimenting with a couple of values, I found that thresholing value of 0.1 gave the best results.
-The second insight was varying the number of nodes in the hidden layer (I've opted for a simple architecture with only one hidden layer). I checked the results with a couple of values and found that the number of units in hidden layer as int(1.65 * input_layer_units) gave good results.
-Thirdly, the gradient descent which I've applied to find the weight matrices was taking too much time with exact line search. So what I exprimented with a couple of values for the descent rate and took 5 best values. After backpropagation, the descent is calculated 5 times and the value which yields the minimum cost is used for the next iteration.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment