Using my knowledge of machine learning and neural networks, I used features in the provided dataset (charity.csv) to help create a binary classifier that is capable of predicting whether applications would be successful if funded by Alphabet Soup. Alphabet Soup's business team sent a CSV containg over 34,000 organizations that have received funding from Alphabet Soup over the years. Within this dataset are several columns that capture metadata about each organization. Let's see which ones are worth funding...
- Preprocessing Data for a Neural Network Model
- Compile, Train, and Evaluate the Model
- Optimize the Model
- EIN and NAME — Identification columns
- APPLICATION_TYPE — Alphabet Soup application type
- AFFILIATION — Affiliated sector of industry
- CLASSIFICATION — Government organization classification
- USE_CASE — Use case for funding
- ORGANIZATION — Organization type
- STATUS — Active status
- INCOME_AMT — Income classification
- SPECIAL_CONSIDERATIONS — Special consideration for application
- ASK_AMT - Funding amount requested
- IS_SUCCESSFUL — Was the money used effectively
- Data: Charity.csv
- Google Colab Pro
- Python 3.7
- Pandas and TensorFlow
Target variables are the variable whose values are modeled and predicted by other variables. Thus, the IS_SUCCESSFUL variable would be the target variable because it contains binary data; This variable refers to whether or not the charity donation was used effectively.
The following columns are all considered to be features of the model because each serves a numerical purpose in defining our goal solution: APPLICATION_TYPE, AFFILIATION, CLASSIFICATION, USE_CASE, ORGANIZATION, STATUS, INCOME_AMT, SPECIAL_CONSIDERATION, and ASK_AMT.
At the beginning of my analysis, I removed the EIN and NAME columns from the charity.csv dataset. These two variables represent identification such as the names of the applicants, both columns serve no true purpose in finding our target results.
To address the limitations of the basic neural network, we can implement a more robust neural network model by adding additional hidden layers. A neural network with more than one hidden layer is known as a deep neural network:
Located in my AlphabetSoupCharity.ipynb file, I decided to only use two hidden layers. There were 80 neurons in the first layer and 30 neurons in the second layer, as well as a 3600 weight parameter in the first layer and 2430 in the second. I only achieved an accuracy score of 0.729 (73%) and I had hoped for at least 75%. Thus, in my next neural network model, I decided to switch some things around to see if I could achieve my desired score. These additional layers can observe and weigh interactions between clusters of neurons across the entire dataset, which means they can identify and account for more information than any number of neurons in a single hidden layer.
I only received an accuracy score of 0.7301 (73%) exactly, I was closer this time but not close enough. Adding a third layer did not make a very big impact.
For this model, I took a new approach entirely. Instead of using the Rectified Linear Unit (ReLU) function, as I had done for my first and second models, I used the Sigmoid function as well as incorporated a fourth layer. Still, I fell short and the result was an accuracy score of 0.7314 (73%). Though I was getting closer to my goal of 75%, the models weren't increasing the accuracy score fast enough.
For my final model, I decided to use the ReLU function again but only incorporate 2 layers. The results didn't differ much yet again, so an overall goal of 75% accuracy is yet to be achieved.
After conducting four different neural network models, I was unable to achieve the desired 75% accuracy score. I changed the number of hidden layers I used and the functions for each model but my results only varied by around .5%. The highest accuracy score I achieved was 0.7314, these results can be found in attempt 3 of my models. I used four different hidden layers along with the Sigmoid function. I noticed that the Sigmoid function compared to the ReLU function gave the best accuracy results for this dataset.