[
  {
    "path": ".gitignore",
    "content": ".DS_STORE\n*.git\n.git\n*.pyc\n*.tfevents.*\n*/__pycache__/*\n*.csv\ncheckpoint\n*.pb\n*.pbtxt"
  },
  {
    "path": "README.md",
    "content": "# Chancey, college admissions predictor.\n\n[ ![Codeship Status for pshah123/ChanceyNN](https://app.codeship.com/projects/920e3190-9232-0135-1d57-766c0c9c4a48/status?branch=master)](https://app.codeship.com/projects/250609) \n\nChancey is a predictor for college admissions based on GPA and SAT2400 data. Surprisingly enough, despite claims of a `holistic` approach, most colleges easily reach ~80% accuracy on this model with ~50 samples of data.\n\n## Reqs\n\n- Python (prefer 3.x)\n- Tensorflow (prefer newest, recommend GPU or high powered CPU)\n- `console-logging` python module, for more beautiful logs, get it from `pip`\n- `numpy`, highly recommend using an Anaconda distribution of Python 3\n- `flask`, get it from `pip`\n\n## How it works\n\nThis is probably the simplest neural network you'll see today. I simply implemented the DNN Classifier, but instead of using a traditional approach with hundreds of nodes, I messed around with the parameters and brought it to 10-20-10 for hidden layers. Extremely simple implementation and straightforward as both of our inputs are standard numbers.\n\nAfter training on a corpus of GPA+SAT data, it can predict admissions.\n\n## Training\n\nSee the README file in the `neuralnet` folder. You will need to call `main.py` from this directory, e.g. `python neuralnet/main.py .. args ..`.\n\nAssemble a dataset CSV file. Cut 1/3 of the contents into another CSV file, this new file is your test dataset.\n\n**Important: if you want the raw accuracy set both training and testing to same CSV and go for one step. Otherwise it always spits out 0.5 -- this is not correct, it's messing up because there are exactly 11 acceptances and 11 rejections in the test dataset. For you it might spit out another number that is the ratio of acceptances and rejections in test dataset. For these, just train to 150k steps or loss around 0.7 or below.**\n\n![Console](images/cmd.PNG)\n\nI have provided the CMU dataset I originally gathered by hand to train this network. More information on naming datasets is in the README file.\n\nQuick stats: Geforce 1060, 6gb, ~4 minutes for 150k steps and ~78.5% accuracy.\n\nGraph of loss over 150k steps:\n\n![Loss](images/loss.PNG)\n\nGraph of accuracy over 150k steps:\n\n![Accuracy](images/loss.PNG)\n\n## Predictions\n\n`python website.py`, you'll need Flask.\n\n![Form](images/form.png)\n\n## FAQS\n\n**Does this mean colleges don't care about me as a person for the most part?**\n\nPerhaps, perhaps not. As my wonderful stat teacher pointed out to me, GPA/SAT are not independent from you as a person. It is likely that many individuals in the dataset had GPA/SAT scores correspondent to their extracurricular activities + essay quality. So no, this does not defiitely prove this. Rather, it suggests that GPA/SAT are powerful metrics that can be used to filter applicants.\n\n**Won't this just scare me away from college apps? How can you be sure this works?**\n\nI'm not sure. That's why the predictor uses language like `likely` and `unlikely`. This isn't perfect, and college admissions are often random and influenced by external factors I can't predict. Don't let this dissuade you from applying to a college. Rather, simply use this to filter through schools if you're like me and had trouble narrowing down your list from 20+.\n\n**Isn't this network way too simple? Shouldn't you add an LSTM layer or RNN capabilities?**\n\nIt may be simple but in this case it *works*. I have implemented in later revisions LSTM cells, but this has seen minor improvements and I am not at liberty to open source those parts of the project. Rest assured that these are _minor_ improvements at best, at least in my experience.\n\nIf you have any ideas to make this more accurate, feel free to contribute! This repo is open to all."
  },
  {
    "path": "neuralnet/README.MD",
    "content": "This repository is no longer maintained. It has been updated and cleaned to the last declassified/safe to release version which is from September 12, 2016.\n\nDue to a switch to proprietary information, we are no longer updating the corpus, model, or python scripts.\n\nUsage:\n\n```python neuralnet/main.py path/to/dataset.csv path/to/test_dataset.csv #maxgpa #maxtestscore```\n\nRun from main directory.\n\nTo restore checkpoints and train on one model, keep the dataset filename the same. To use a new model use a newly named dataset or delete the model from its directory under train/model/.\n\nNote: with the Carnegie Mellon corpus and provided checkpoint (150000 steps), relatively high accuracy can be achieved upon further training.\n\nTraining stats:\nTrained on GeForce 1060, 6GB. ~4 minutes for 150,000 steps @ 78.5% accuracy on a relatively small dataset.\n\nNo overfitting observed. Cross validation dataset and cross validation scripts are not provided due to licensing issues. :sadparrot:\n\n*Interestingly, it would appear even if overfitting occurred, that this may not affect our true accuracy, as college admissions behave as if overfitted... but this is a conjecture to be tested at a later point.*\n\nThe `train` directory is at the root level. The `train` directory in this folder is used simply for testing."
  },
  {
    "path": "neuralnet/main.py",
    "content": "from __future__ import absolute_import, division, print_function\nimport tensorflow as tf\nimport numpy as np\nimport os\nimport sys\nfrom console_logging.console import Console\nfrom sys import argv\n\nusage = \"\\nUsage:\\npython neuralnet/main.py path/to/dataset.csv path/to/crossvalidation_dataset.csv #MAX_GPA #MAX_TEST_SCORE\\n\\nExample:\\tpython main.py harvard.csv 6.0 2400\\n\\nThe dataset should have one column of GPA and one column of applicable test scores, no headers.\"\n\nconsole = Console()\nconsole.setVerbosity(3) # only logs success and error\nos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n\ntry:\n    script, dataset_filename, test_filename, maxgpa, maxtest = argv\nexcept:\n    console.error(str(sys.exc_info()[0]))\n    print(usage)\n    exit(0)\n\ndataset_filename = str(dataset_filename)\nmaxgpa = float(maxgpa)\nmaxtest = int(maxtest)\n\nif dataset_filename[-4:] != \".csv\":\n    console.error(\"Filetype not recognized as CSV.\")\n    print(usage)\n    exit(0)\n\n# Data sets\nDATA_TRAINING = dataset_filename\nDATA_TEST = test_filename\n''' We are expecting features that are floats (gpa, sat, act) and outcomes that are integers (0 for reject, 1 for accept) '''\n##\n\n# Load datasets using tf contrib libraries\ntraining_set = tf.contrib.learn.datasets.base.load_csv_without_header(filename=DATA_TRAINING,\n                                                       target_dtype=np.int,features_dtype=np.float)\ntest_set = tf.contrib.learn.datasets.base.load_csv_without_header(filename=DATA_TEST,\n                                                   target_dtype=np.int,features_dtype=np.float)\n##\n\n# First two columns are gpa, sat/act, which are our features\nfeature_columns = [tf.contrib.layers.real_valued_column(\"\", dimension=2)]\n\n# Build a neural network with 3 layers. We're putting the model into /train/model/\n# I found 3 hidden layers with 10, 20, and 10 nodes respectively works well. You may find other setups.\nclassifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,\n                                            hidden_units=[10, 20, 10],\n                                            n_classes=3,\n                                            model_dir=\"./train/model/\"+dataset_filename[dataset_filename.rfind('/')+1:-4],\n                                            config=tf.contrib.learn.RunConfig(\n                                                save_checkpoints_secs=10))\n##\n\n# Helper functions\ndef get_train_inputs():\n    x = tf.constant(training_set.data)\n    y = tf.constant(training_set.target)\n    return x, y\n\ndef get_test_inputs():\n    x = tf.constant(test_set.data)\n    y = tf.constant(test_set.target)\n    return x, y\n##\n\nprint(\"How many steps should we train for?\")\nmaxsteps = int(input('> '))\n\n# Create the classifier. Take maxsteps steps.\nclassifier.fit(input_fn=get_train_inputs, steps=maxsteps)\n\n# Evaluate loss.\nresults = classifier.evaluate(input_fn=get_test_inputs, steps=1)\nprint(results)\nconsole.success('\\nFinished with loss {0:f}'.format(results['loss']))\n\nprint(\"\\nPlease provide a GPA and test score to chance.\")\ncur_gpa = float(input('GPA: '))\nprint(\"Given \"+str(cur_gpa))\ntest_score = int(input('Test Score: '))\ndef new_samples():\n    return np.array([[0.0, 0], [cur_gpa,test_score], [maxgpa, maxtest]], dtype=np.float32)\npredictions = list(classifier.predict(input_fn=new_samples))\nconsole.success(\"Made predictions:\")\n\ndef returnChance(chance):\n    if chance==0:\n        return \"rejection\"\n    if chance==1:\n        return \"admission\"\n\nconsole.log(\"Testing:\\nGPA: 0\\nTest Score: 0\\nPrediction: %s\\nExpected: rejection\"%returnChance(predictions[0]))\nconsole.log(\"Testing:\\nGPA: %0.1f\\nTest Score: %d\\nPrediction: %s\\nExpected: admission\"%(maxgpa, maxtest, returnChance(predictions[2])))\nconsole.success(\"Predicting:\\nGPA: %d\\nTest Score: %d\\nPrediction:%s\"%(cur_gpa, test_score, returnChance(predictions[1])))"
  },
  {
    "path": "neuralnet/predict.py",
    "content": "from __future__ import absolute_import, division, print_function\nimport tensorflow as tf\nimport numpy as np\nimport os\nfrom sys import argv\nfrom console_logging.console import Console\nconsole = Console()\n\nusage=\"You shouldn't be running this file.\"\n\nos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\nconsole.setVerbosity(3)  # only error, success, log\n\nscript = 'predict.py'\ndataset_filename = './neuralnet/corpus/carnegie_mellon.csv'\nmaxgpa = 5.0\nmaxtest = 2400\ndataset_filename = str(dataset_filename)\nmaxgpa = float(maxgpa)\nmaxtest = int(maxtest)\nif dataset_filename[-4:] != \".csv\":\n    console.error(\"Filetype not recognized as CSV.\")\n    print(usage)\n    exit(0)\n\n# Data sets\nDATA_TRAINING = dataset_filename\nDATA_TEST = dataset_filename\n''' We are expecting features that are floats (gpa, sat, act) and outcomes that are integers (0 for reject, 1 for accept) '''\n\n# Load datasets using tf contrib libraries\ntraining_set = tf.contrib.learn.datasets.base.load_csv_without_header(filename=DATA_TRAINING,\n                                                                      target_dtype=np.int, features_dtype=np.float)\ntest_set = tf.contrib.learn.datasets.base.load_csv_without_header(filename=DATA_TEST,\n                                                                  target_dtype=np.int, features_dtype=np.float)\n##\n\n# First two columns are gpa, sat/act, which are our features\nfeature_columns = [tf.contrib.layers.real_valued_column(\"\", dimension=2)]\n##\n\n# Build a neural network with 3 layers. We're putting the model into /train/model/\n# I found 3 hidden layers with 10, 20, and 10 nodes respectively works well. You may find other setups.\nclassifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,\n                                            hidden_units=[10, 20, 10],\n                                            n_classes=3,\n                                            model_dir=\"./train/model/\" +\n                                            dataset_filename[dataset_filename.rfind(\n                                                '/') + 1:-4],\n                                            config=tf.contrib.learn.RunConfig(\n                                                save_checkpoints_secs=60))\n##\n\n# Helper functions\ndef get_train_inputs():\n    x = tf.constant(training_set.data)\n    y = tf.constant(training_set.target)\n    return x, y\n\ndef get_test_inputs():\n    x = tf.constant(test_set.data)\n    y = tf.constant(test_set.target)\n    return x, y\n##\n\nmaxsteps = 1\n# Create the classifier. Take just one step, we're testing not training.\nclassifier.fit(input_fn=get_train_inputs, steps=maxsteps)\n\ndef predict(cur_gpa, testscore, test_type):\n    #TODO: implement test_type\n    gpa_in = cur_gpa\n    testscore_in = testscore\n    def new_samples():\n        return np.array([[gpa_in, testscore_in]], dtype=np.float32)\n    predictions = list(classifier.predict(input_fn=new_samples))\n    return predictions"
  },
  {
    "path": "requirements.txt",
    "content": "tensorflow\nconsole-logging\nnumpy"
  },
  {
    "path": "templates/website.html",
    "content": "<html>\n    <head>\n        <title>College Predictor</title>\n        <link rel=\"stylesheet\" media=\"screen\" href=\"//netdna.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css\">\n        <script src=\"//code.jquery.com/jquery.js\"></script>\n        <script src=\"//netdna.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js\"></script>\n    </head>\n    <body>\n        \n        <div class=\"container\">\n            \n            <div class=\"page-header\">\n              <h1>The Dean<small> ® Priansh Shah 2017</small></h1>\n            </div>\n            <form action=\"/predict\" method=\"POST\" role=\"form\">\n                <legend>Chance yourself at {{ college['name'] }}! (accuracy: {{ college['accuracy'] }})</legend>\n            \n                <div class=\"form-group\">\n                    <label for=\"gpa\">GPA</label>\n                    <input type=\"text\" class=\"form-control\" id=\"gpa\" name=\"gpa\" placeholder=\"5.0\">\n                </div>\n            \n                <div class=\"form-group\">\n                    <label for=\"test_score\">SAT 2400 Equivalent</label>\n                    <input type=\"text\" class=\"form-control\" id=\"test_score\" name=\"test_score\" placeholder=\"2400\">\n                </div>                \n            \n                <button type=\"submit\" class=\"btn btn-primary\">Chance Me</button>\n            </form>\n            \n        </div>\n        \n    </body>\n</html>"
  },
  {
    "path": "test.py",
    "content": "from flask import Flask, render_template, request\nimport neuralnet.predict as pr\nfrom console_logging.console import Console\nconsole = Console()\napp=Flask(__name__)\n@app.route('/predict', methods=['POST'])\ndef predict():\n\n    # get form variables and type them\n    gpa = float(request.form[\"gpa\"])\n    score = int(request.form[\"test_score\"])\n    console.info(\"Chancing GPA: %d, SAT: %d\"%(gpa,score))\n    predictions=[]\n\n    #TODO: implement test type. This is a stub.\n    if score<=36:\n        predictions=pr.predict(gpa,score,\"ACT\")\n    elif score<=1600:\n        predictions=pr.predict(gpa,score,\"SAT1600\")\n    else:\n        predictions = pr.predict(gpa,score,\"SAT2400\")\n    ##\n\n    if predictions[0]==1:\n        return \"Admission is likely.\"\n    else:\n        if predictions[0]==0:\n            return \"Admission is unlikely.\"\n        return \"Something went wrong.\"\n\n@app.route('/')\ndef home():\n    return render_template('website.html', college={'name':'CMU','accuracy':'78.6517'})"
  },
  {
    "path": "website.py",
    "content": "from flask import Flask, render_template, request\nimport neuralnet.predict as pr\nfrom console_logging.console import Console\nconsole = Console()\napp=Flask(__name__)\n@app.route('/predict', methods=['POST'])\ndef predict():\n\n    # get form variables and type them\n    gpa = float(request.form[\"gpa\"])\n    score = int(request.form[\"test_score\"])\n    console.info(\"Chancing GPA: %d, SAT: %d\"%(gpa,score))\n    predictions=[]\n\n    #TODO: implement test type. This is a stub.\n    if score<=36:\n        predictions=pr.predict(gpa,score,\"ACT\")\n    elif score<=1600:\n        predictions=pr.predict(gpa,score,\"SAT1600\")\n    else:\n        predictions = pr.predict(gpa,score,\"SAT2400\")\n    ##\n\n    if predictions[0]==1:\n        return \"Admission is likely.\"\n    else:\n        if predictions[0]==0:\n            return \"Admission is unlikely.\"\n        return \"Something went wrong.\"\n\n@app.route('/')\ndef home():\n    return render_template('website.html', college={'name':'CMU','accuracy':'78.6517'})\n\nif __name__ == '__main__':\n    app.run(host='0.0.0.0')"
  }
]