A step-by-step gentle journey through the mathematics of neural networks, and making your own using the Python computer language.
Neural networks are a key element of deep learning and artificial intelligence, which today is capable of some truly impressive feats. Yet too few really understand how neural networks actually work.
This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work. You won’t need any mathematics beyond secondary school, and an accessible introduction to calculus is also included.
The ambition of this guide is to make neural networks as accessible as possible to as many readers as possible – there are enough texts for advanced readers already!
You’ll learn to code in Python and make your own neural network, teaching it to recognise human handwritten numbers, and performing as well as professionally developed networks.
medium: interactive installation
space requirements: 2.5 square metres
There is something very engaging about the unexpected. The unexpected in this generative art installation is in the direct and gestural movement the spectator can have on a graphical interactive display and how they react to or notice the sounds they are making. There is a lot for the person to learn about the work. How they fit into it. What they add to the work. How they can construct both unique and unusual sounds and graphics that they feel are part of a performance. Their performance. They are the conductors of the piece.
The project is an installation of a symbolic journey in darkness with an element of hope. The spectator embarks on a symbolic journey of interaction and creation of both sound and visuals.The performance starts up by the spectator entering a small light-projected arena. They stay in a centre area for until the sensors pick them up and all the system is reset.
The graphics which are projected onto the floor are reset and so is the sound score to nothing. The person notices that as they travel around the projected arena the graphics seems to be approaching them or backing off them depending on the type of motion they make. They also notice that the sounds they create are made by there own movements. Each time they move they create a series of sounds. The darkness side of the piece of work corresponds to a black generative graphic that spreads towards the spectator if she/he does not move.
The hope of the piece lies in the somewhat dissonant landscape sounds created by the movement of the person. Partly noise and partly tonal.
The balance between the graphic and sound (darkness and hope) lies between the number of movements and distance moved relating to the numbers of sound samples generated and the way the graphic moves towards the spectator or away from the spectator. The sound will also be influenced as well by the persons movement (motion detection algorithm).
Creative Motivation
The motivation for the piece was driven by a soundscape I made in MAXMSP, using a Fourier transform (this was the sound used in maxmsp called “S00_FOURIER”). I found the sound quite haunting and began to think about how visuals and interaction can form the basis of a artwork.
Future Development
For further development I would like to work with a contemporary dancer in using the artwork as part of a performance. I would like to see it as a first step with the work, I would intend to develop a group performance (2 or 3 people).
Why make the work? When I see a person interact with my artwork sometimes I feel it elevates my work: it takes it into a new unexpected direction. The for me is quite beautiful and unexpected and very rewarding so thats why I wanted to make it. In terms of who is it for? I would say the basic version would be for everyone. People find it stimulating and unexpected (especially the gesture interaction). In terms of a performance (dance) piece I would use it only as the start of something and develop it further in terms of a narrative piece of audio visual work.
THE RESULT
MACHINE LEARNING
I wanted to use Machine Learning with this work to experiment with unusual outcomes. This is something Machine Learning can do very well as it can map non-linearly and quite arbitrarily using different features. Its very fast to train when using Wekinator and changes are made quickly as well. An added bonus to this in a new version of Wekinator is the ability to use multiple training sets in one piece of work. Switching between them when you would like. This too could be decided by Wekinator. When you see all these possibilities Machine Learning becomes a valuable tool in all interactive installations.
What datasets did you use? I used a mixture of straight forward mapping to using arbitrary trained data based on either a logical solution or were chosen arbitrarily using different features. Each feature and the data sets are spoken of again in greater detail below.
The final project used to trained data sets:
WEK_TRAIN09C and WEK_TRAIN08D Both of these were used and switched between each other every minute.
WEK_TRAIN08D : Training data set 01
WEK_TRAIN09C: Training data set 02
This project has an overlap with my MAXMSP project. Half of this project was designed for a MAXMSP sound interactive experience and the graphic experience was designed for machine learning. The max project was further separated by removing all the machine learning code and only using hardcoded inputs and outputs. A summary of the differences between the 2 projects are as follows
The machine learning installation is using 20 trained inputs. The MAXMSP installation is using 9 hardcoded inputs.
See figure below:
Hardcoded inputs:
PERSON POSITION X AND Y
VELOCITY VALUES ABS
ACCELERATION VALUES
TIME PASSED IN MINUTES
SYSTEM ACTIVE OR PASSIVE
GESTURES SPLIT INTO 4 SIGNALS
GESTURES X AND Y
Wekinator Helper added values: Accel Max. Value over the last 10 readings
Acceleration Standard Deviation
Position X and Y Standard Deviation
Wekinator trained outputs:
No. Name
1,2 PERSON POSITION X AND Y 3 VELOCITY VALUES ABS 4 ACCELERATION VALUES 5 TIME PASSED IN MINUTES 6 SYSTEM ACTIVE OR PASSIVE 7 Accel Max. Value over the last 10 readings 8 Acceleration Standard Deviation 9,10 Position X and Y Standard Deviation 11 Conductor Events 1-10 12 Conductor Time milliseconds 0-60000 13 Graphics Trigger Values 14 Color Trigger Values 15 Particle Maxspeed 16 Particle Maxforce 17 Mix up Graphics 1-5 18 Corners 1-4 19,20 Gesture positions X Y
Machine Learning Development
The Machine Learning Development started with a straightforward solution of direct mappings to match the hard coded versions (outputs 1 to 6). I did this so that communication was coming from only one source and not 2 separate sources and also as have the ability to train non-linearly. The position of the person is a very important interaction with the installation. Graphically this is fundamental to making the person feel like they were interacting with the piece. Wherever they move the graphics tends to follow suit. All the data sets were set up by me using my data.
What machine learning and/or data analysis techniques have you used, and why?
The project uses Wekinator helper and Wekinator main to train it to select different graphics, whether the graphics approaches or goes away from the user, the force of attraction to the user, when new different graphics are used, when and what and for how long generative music and samples are used. The algorithms used were either polynomial regression or neural networks or Classification k-nearest neighbour.
Adding Variety
11 and 12 Conductor Events 1-10. Conductor Time milliseconds 0-60000.
These values are used for MAXMSP. They control which sounds are made and how long for. These are in turn triggered by the posy of the person. Depending where the person is standing they will call two different sounds either generative or generative and samples. This is what the Conductor Event does for the project.
The conductor events were trained with posy co-ordinates depending on where you are standing a different value between 1-10 will be defined. Conductor events used a k-nearest neighbour to select events. Conductor Time used polynomial regression to select the time based on posy as well.
The results were very good. It was not predictable which time would be chosen and which event would be chosen as the person was generally moving continuously. Very happy with this setup.
In set 02 of training data Conductor Events used pos x and the Conductor Time was used with pos x as well. Although in theory the results should be similar they were not. People tend to move less in X than in Y!
Also the standard deviation of acceleration or a limit above a value of acceleration would also simulate generative sounds when they chosen to played by the Conductor Events. This was a key aspect of the setup.
13 and 14. Graphics Trigger Values/Color Trigger Values.
These values controlled if a graphic is created and if it changes colour. They trigger different graphic counters and add all of the “bright swarms” and “black swarms”.
If the acceleration of the person was above either Graphics Trigger Values/Color Trigger Values they would trigger new graphics and change those graphics respectively.
These values were triggered in Wekinator using polynomial regression and a neural network. I had some slight issues training with zero using values that were only available if there was a acceleration. I couldn’t get exactly a zero value. As well as this issue I had a problem training to non-zero starting points as well. They would not give start and end values which didn’t start at 0. However, these were minor problems as remapping Wekinator values at the destination program was easy.
These values were trained in set 01 of training data Graphics Trigger Values to acceleration max over a window of 10 with polynomial regressionand Color Trigger Values to a Neural network and the acceleration stand deviation over window of 10. Both gave me good results.
These were logical choices to choose for the activation of new shapes to be produced on the floor when the graphics trigger control value was exceeded by a fast/erratic movement.
The Graphics Trigger Values limits were further constrained to produce reasonable results. (So a sudden movement would push the acceleration max up dramatically but this movement was further confined to medium changes. Otherwise hardly any graphics would be made.) The Color Trigger Valuess limits were restricted as well to give a good balance between non-excited graphics colours and excited graphics colours
These were good parameters to get Wekinator to train with as they had a lot of variety in readings and were not constant.
In set 02 of training data I chose position standard deviation in position x and y for both graphics trigger and colour trigger changes.These were more sensitive than the acceleration values and gave better results.
15. and 16. Particle max speed and Particle max force.
These parameters control all for the particle swarms particle speed and force of attraction to the target.
These outputs were trained on x and y standard deviations respectively with a poly. regression and neural network. I didn’t notice any real differences in performances between the poly. regression and the neural network. They work well but had to be further refined to less extreme outputs. However, they produced unexpected patterns of swarming behaviour. They ranged from very small movements to forming circular patterns. The small movements we’re expected but the circular patterns were not. That was really nice and totally unexpected.
In set 02 of training data I chose to train just the position x and y
17. Mix up Graphics 1-5.
This value controls when different graphics are seen when the correct counter values for each graphics have been triggered.
The data was trained using k-nearest neighbour on classification using a input of of pos-y. The “Mix up Graphics” outputs values from 1-5 based on the position Y of the user. The results were good as well as these outputs were changing all the time and they were switching outputs to different sets of visuals all the time. Totally unpredictable as to which solution would be chosen depending on a counter trigger of accelerations reached over a certain time and that limit value being set by Wekinator and changing all the time depending on where the user is positioned.
In set 02 of training data I chose to train just the position x
18,19 and 20. Gestures.
The gestures calculate extreme Kinect depth values either top or bottom or left or right. They will directly control the position of all of the graphic positions if they are detected.
The gestures were probably the best visual cue feature in the work. Everyone loved them. They were triggered by the distance between extreme depth value pixel readings from the average depth pixels picked up. They were immediate and developing them further on the graphics side would be nice feature that they change the size of the graphics when they were activated. For the sound would be nice if they changed the generative sounds that were chosen currently they trigger generative sounds. Using Machine Learning to generate the gestures could be possible but you would need a visual cue for the gestures as the current data inputs are not time related as much a visual related. The current visual traits could possibly work but not with a kinect.
Instructions for compiling and running your project.
The setup is as follows.
What software tools were used?
The data inputs: kinnect input via processing and OSC Data outputs: all outputs sent to Wekinator Helper and Wekinator Wekinator Outputs: Set to MAXMSP andProcessing for sound and graphics outputs
The inputs that detect if someone is there or not are being detected by using a Kinect using infrared dots. A Kinect is needed to run the project properly via processing. However, the mouse inputs can be used for testing purposes and a processing sketch with the mouse setup for making up outputs has also been made. The Kinect passes on positional, velocity,acceleration data and gesture data as well.
Next the outputs from the Kinect or the mouse feed into Wekinator helper and they in turn feed into Wekinator. Wekinator helperadds on 4 output values:
Accel Max. Value over the last 10 readings
Acceleration Standard Deviation
Position X and Y Standard Deviation
which are all used in the Wekinator main project.
After that all Wekinator outputs (shown in the machine learning section) feed back into the graphics display patch (in processing) that display all the particle swarm systems and also they further re-send the outputs as inputs to the sound control module that uses MAXMSP. Two training sets of data were used in the final project.
The Graphics
The Graphics were based upon a Shiffman swarm algorithm https://processing.org/examples/flocking.html then modified to have different behaviours depending on the last distance the particle has travelled to and the distance from current particle position to the target position. The Particles had the ability to apply different kinematics solutions depending on the answers to these two questions. Further to one swarming solution multiple swarms were coded with different kinematics on top of this to look completely different from the original particle swarms.
The processing sketch consists of 5 particle swarms called: Ameoba02Vehicle: cell like but big AmeobaVehicle: cell like but small BrightVehicle: tadpole like DarkVehicle: small black particles EraticVehicle: like a “firework” pattern that uses a lissajous pattern
These are triggered by the Graphics Trigger Values. This in turn triggered counters for all the the graphics which keep getting updated. However, when they are seen is triggered by “Mixup Graphics” value that keeps changes it mind about the trigger values.
If the person does not move or the distance to the target is within a certain range that has been set u for each swarm they will be deleted. If the distance to the target is within another certain range all the particles have a random target algorithm incorporated.
Every minute the wekinator training set data was switched around between the 2 training sets from the graphics control and display patch made in processing:
sketch_TEST33A_20inputs_WEK_INPUTOSC
WEK_TRAIN08D: Training data set 01 WEK_TRAIN09C: Training data set 02
The MAXMSP Sound Control:
All the sketches are contained within the “SALL” patch.
Which consists of 8 patches:
S_CONDUCTOR1 – decides which sounds to play for and how long for (from S01 to S06). Gets readings from Wekinator. All of the generative and samples only play for “x'” seconds if called by S_CONDUCTOR1.
The following sounds and samples can be called:
S01_STARTA – A 16 voice poly patch that uses noise with a Reson filter to output random noise values S00_FOURIER – Uses Noise in fourier transform to output discreet packets of noise that slowly dissipate
S02_ACCEL – Another filter comb~ uses a sample input to mix to timings of
the sample S03_SAMLPLES- Uses 9 samples in the patch and mixes them together with
different timings S04_ACCEL – Uses a fffb~ filter a subtractive filter uses pink noise to give discreet packets of noise similar to S01_STARTA patch. S05_ACCEL – Based on the fm-surfer patch. A frequency modular syntheziser S06_SAMPLES_RESTART – Uses 9 another set of samples in the patch and mixes
them together with different timings.
Max code related origins:
The code for sound was based upon :
S00_FOURIER :This code was partially based on The forbidden planet patch
S01_STARTA patch:This code was partially based on Polyphony Tutorial 1: Using the poly~ Object
S02_ACCEL patch: This code was based on MSP Delay Tutorial 6: Comb Filter
S04_ACCEL patch:This code was based on Filter Tutorial 4: Subtractive Synthesis
S05_ACCEL patch : This code was partially based on FM-SURFER patch.
S03_SAMLPLES & S06_SAMPLES_RESTART: This code was partially based on the Sampling Tutorial 4: Variable-length Wavetable
03_SAMLPLES_REV & 06_SAMLPLES_REV: This code was partially based on the Sampling Tutorial 4: Variable-length Wavetable
S_CONDUCTOR1 patch: My code.
Summary in running the installation:
A reflection how successful your project was in meeting your creative aims.
The project more than matched my aims as a lot of the results I can’t predict the outcome as its too complex.Fascinating the difference between a machine learning environment and just programmed: I cannot predict what will happen in machine learning. Its too complex. Especially when I start using standard deviation outputs. Inadding in this complexity the result of which has given itDIFFERENT FEELING. It seems to have a nature of its own the graphics seem more alive to me. For me the results of the sound were amazing as well for two weeks work. They blended brilliantly. The samples would kick in when the person stopped moving. The generative sound would kick in when they detected acceleration values above a limit. That combined with the S_CONDUCTOR1 patch which decided when and how long to play samples for gave a very organic feeling to the installation.
What works well?
The best features were the conductor and the gestural features. They were intuitive and liked a lot by most users. The conductor played the generative music and played the samples when the person wasn’t moving much. It worked perfectly. The conductor decided what samples to play and how long they lasted for depending on the position of the person (either the x or y co-ordinate of a person). This could easily be changed to another parameter in Wekinator. The gestural features were based on collecting extreme values of the depth readings in terms of their position. From the extreme data positions like a compass (N-S-E-W) I was able to decide where the graphics should move to and when if a person made a gesture and this again worked so well for a relatively simple solution.
What challenges did you face?
The initial challengesfaced are knowing the best way to use Wekinator it was was not so much about Wekinators algorithmsas it was in creating new outputs that were coming only from Wekinator and being created by Wekinator. This was a steep learning curve to understand that it was the creative outputs that were the most important use of Wekinator. However, once I understood that this the creation of the outputs was relatively straight forward. Obviously, the limits to these outputs are only prohibited by the time it takes make them and ones imagination. The results are great.
What might you change if you had more time?
Apart from direct outputs Everything. I would just keep on experimenting until I could envisage as many variations as possible. The only way to really understand the best use of the tool is to keep trying out different solutions.
Style transfer is the technique of recomposing images in the style of other images. These were mostly created using Justin Johnson’s code based on the paper by Gatys, Ecker, and Bethgedemonstrating a method for restyling images using convolutional neural networks. Instructions here, and more details here. A gallery with all of these and more style transfers can be viewed here.
Created by Design I/O, World’s Tiniest Violin is a ‘speed project’ that uses Google’s Project Soli – Alpha Dev Kit combined with the Wekinator machine learning tool and openFrameworks to detect small movements that look like someone playing a tiny violin and translate that to control the playback and volume of a violin solo.
The team used the Project Soli openFrameworks example provided with the ofxSoli addon and searched for the signal that seemed to correlate closest with the tiny violin gesture. In this case it was the fine displacement signal, which then they fed the delta of to Wekinator via OSC. Theo (Design I/O) then had to train Wekinator on what types of finger movements corresponded to playing the violin and which ones it should reject. So he recorded a few different finger movements and assigned the value of 1.0 on the slider. The slider to 0.0 and recorded gestures were then set which didn’t correspond: like pulling your hand away from the sensor, or just holding it there without moving your fingers. After a few minutes of recording these gestures, the ‘training’ was initiated and they were then able to send back an animated value ranging from 0.0 to 1.0 representing how much Theo’s hand looked like it was trying to play a tiny violin. The last step was to map that number to the volume of the violin sample that was being played back by the openFrameworks app.
Created by Philipp Schmitt (with Margot Fabre), ‘Computed Curation’ is a photobook created by a computer. Taking the human editor out of the loop, it uses machine learning and computer vision tools to curate a series of photos from an archive of pictures.