-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How about input and output activation functions changing? #22
Comments
Hi! This is an interesting idea! In the CPPN case (with different activations but trained weights) if the output or input would benefit from having a different activation, it can be found by placing a node with that activation directly in front or behind --- but in the WANN case this complicates things a bit. Using them directly on the inputs is especially interesting. In the swing-up case we saw in the best solutions that the 'd_theta' input was only ever connected to the network through a Gaussian activation: this is a symmetric activation, so it only produced signal based on whether it was moving or not, disregarding any directional information. it is possible directly connecting to the input could also do some useful preprocessing of the inputs. I have to say I was surprised by how much simpler the 'line detector' is! These experiments, to develop 'kernels' like this for convolution was something we thought about trying and I think is very promising --- nice work! Also, the plots are beautiful, what did you use to create them? |
Thanks for your reply! 👏 I'm glad you appreciate my research. I used Figma to create these plots. |
I'm just here to say this is the highest quality GitHub issue I ever read. 😮 |
Hello @picolino . Could you please explain to me how the network works in the first XOR test? I do not understand how you get the right answer. If we take the first neural network and transfer (1,1) to it and take the weight: -2, we get: First hidden layer (neuron with Inverse func): (1*(-2) + 1*(-2))^-1 = -0.25 But not 0. Explain, if I misunderstood something, please. |
Hello. I was really delighted by this new type of structual optimization of neural networks.
Thank you for your job, it is really awesome. 👏
Currently, I making detailed research about architectures that can be generated by using WANN algorithms within classification tasks. And at some point i've tried to change activation functions in input and output layer and I've got interesting results:
Experiments
Applied hyperparameters for all experiments:
Weights set: -2, -1, -0.5, 0.5, 1, 2
Max generations: 1024
Population size: 512
Rank by performance only \ by network complexity: 20% \ 80%
Add connection probability: 20%
Add neuron probability: 25%
Change activation function probability: 50%
Enable disabled connection probability: 5%
Keep best species in next population (elitism): 20
Destroy bad species in next population (cull): 20
XOR
Experiment 1:
Generated architecture without changing activation functions in input and output layer:
Mean error (for all weights): 0
Experiment 2:
Generated architecture with changing activation functions in input and output layer:
Mean error (for all weights): 0
Straight lines detection
Inputs: 3x3 square images
Outputs: 2 squares on the right side of each set.
If horizontal line only exists - output must be (1, 0).
If vertical line only exists - output must be (0, 1).
If both of it exists - output must be (1, 1).
If noone straight line exists - (0, 0).
Target: Teach neural network to detect straight (black) lines in 3x3 image.
Experiment 1:
Generated architecture without changing activation functions in input and output layer:
Mean error (for all weights): 0.0455
Experiment 2:
Generated architecture with changing activation functions in input and output layer:
Mean error (for all weights): 0.0469
Conclusions
Changing activation functions in input and output layers could reduce complexity without loss of accuracy.
It may reduce required computations.
I guess this is because connections that goes from input to hidden and from hidden to output nodes. In some tasks they really can interfer optimization, so synthesis algorithm must "destroy" them by adding additional nodes and connections.
P.S. I really hope my investigation could help for improving this awesome neural networks structural synthesis algorithm.
❤
The text was updated successfully, but these errors were encountered: