added README, removed old output file

d367596c · Sushant Mahajan · 4e8b12ca · d367596c · 4e8b12ca
Commit d367596c authored Apr 13, 2016 by Sushant Mahajan
Expand all Show whitespace changes
Inline Side-by-side

Showing with 11 additions and 1601 deletions

README README +11 -0

answer.txt answer.txt +0 -1601

No files found.
--- a/README
+++ b/README
+-The first step was to analyze the data. I tried looking at decision tree classifiers for the data but the trees being too complex I could not draw any conclusions.
+-I normalized the data as I found it gave better results. The normalization was zero mean and unit variance.
+-Having found the variance for all features, I found that for many features the value of variance was very low. From this I deduced that these features, by not varying much were not contributing much to the prediction.
+-After experimenting with a couple of values, I found that thresholing value of 0.1 gave the best results.
+-The second insight was varying the number of nodes in the hidden layer (I've opted for a simple architecture with only one hidden layer). I checked the results with a couple of values and found that the number of units in hidden layer as int(1.65 * input_layer_units) gave good results.
+-Thirdly, the gradient descent which I've applied to find the weight matrices was taking too much time with exact line search. So what I exprimented with a couple of values for the descent rate and took 5 best values. After backpropagation, the descent is calculated 5 times and the value which yields the minimum cost is used for the next iteration.
--- a/answer.txt
+++ b/answer.txt