This software was largely created by AI Vibe Coding
Created by YouMinds
Most people associate human-like machines that can see, speak and reason with brain-like neural networks
that can think and reflect on a problem.
But Artificial Neural Networks (ANNs) are mostly stateless and primarily master one thing: they excel at
curve fitting, mapping input data to outputs through complex functions, enabling them to solve diverse
problems, including training large language models (LLMs).
Let’s now demystify this core concept of modern AI.
The problem
Imagine having a set of data points that you want to predict and interpolate.
E.g. heart rates and cholesterol levels of patients, and their probability of a heart attack
A feedforward neural network is designed to accomplish precisely this task.
It can be trained to fit a mathematical curve to the data points.
It could then determine the risk of heart attack for any patient.
The number of features has no upper limit, i.e. significantly more patient data
like age, blood sugar, medication, weight, etc. could be included in the prediction.
In this simulation we use random data and you can choose between a 2D and 3D curve.
In 3D mode, the network utilizes two feature inputs, X1 and X2.
In 2D mode, it handles only one feature input.
In both modes the probability Y1 is calculated.
Loss:
Epoch: / 2000
...
Shuffle:Batch:Learning rate:
Press Create new data to try another set for curve fitting.
Press Start Training
to fit the curve to the specified data points and watch the loss decrease as the curve fits better
and the weights and biases below update.
Change Learning rate or Batchsize to escape from a solution valley
(process stucks).
The Neural Network architecture
We use a Multi-Layer Perceptron with a hyperbolic tangent activation function (φ = tanh) for curve
fitting, which is shown below at the left. MLPs can be calculated
completely as a mathematical formula, which is written out on the right.
The model data (weights and biases) are initially filled randomly. You can recognize them as
blue (positive) and red (negative)
numerical values
in the formula as well as the thickness of the connecting lines in the network. Thick connectors have a high
weight, multiply the input value more and thus transmit more information.
Use the plus and minus buttons to change the network architecture.
Also watch the network formula to change.
The Neural Network intelligence
The intelligence of an artificial neural network (ANN) is stored in its weights and biases,
as these parameters (the model data) are adjusted during training to shape the curve and
to minimize error and effectively capture patterns and relationships within the data.
Watch
how the weights and biases change during training.
You can observe the changes even better if you reduce the learning
rate above.
W35 means. The weight connects neuron 3 from the previous layer with neuron 5 from the current
layer.
What is curve fitting anyway
Basic Concept
In curve fitting, a model learns to map inputs to outputs by finding a function that best describes the
data. Neural networks do this through layers of neurons, adjusting weights to minimize the error between
predicted and actual values.
Neural networks do not think. Particularly in the context of curve fitting, they are designed to learn and
predict the likelihood of a solution based on math and statistics rather than making absolute decisions.
This highlights their role in function approximation and probabilistic modeling rather than decision-making
in the way humans think.
An Artificial Neural Network (ANN) is like a smart system that learns from data to make predictions. Imagine
you have some data points, and you want to draw a smooth curve that fits these points. An ANN can do this by
using layers of tiny calculators called neurons. Each neuron does a simple math calculation on the data it
receives X and passes the result Y to the next
layer.
Y = φ (∑j=1..n Wj· Xj + B)
These neurons are connected by weights W, which tell the neuron how important
each connection is. There's also
a bias B, which is an extra number added to the calculation to help fine-tune
the results.
Neurons also use
something called an activation function φ.
Activation functions introduce non-linearity to neural networks by applying a non-linear transformation to
the input data, enabling the network to learn complex patterns, relationships, and represent intricate
decision boundaries.
In this simulation we use a hyperbolic tangent activation or tanh.
Activation Functions Visualizer
As the ANN trains, it adjusts these weights and biases to make better predictions. This process helps the
ANN learn the best way to fit the curve to the data.
So, in essence, ANNs use many simple math steps to learn patterns in data and make predictions, much like
connecting dots to form a curve.
Applicability
Image Recognition: Neural networks map pixel values (inputs) to probabilities of object
categories
(outputs). Essentially, they fit a complex curve to classify images correctly.
Speech Recognition: Input audio signals are mapped to text outputs, again fitting a function
to
translate sound waves into words.
Recommendation Systems: User preferences and interactions are used as inputs to predict
future
preferences or actions, which is another form of curve fitting.
Text Prediction:
LLMs like GPT-3 fit a function to large text corpora, learning the probability distribution of words
and
sequences. This is akin to fitting a curve that predicts the next word given the context of previous
words.
Pattern Recognition:
They detect and model complex patterns in text data, understanding context, semantics, and syntax by
fitting intricate functions to language data.
Does this represent intelligence or consciousness
The simplest ANN imagineable is just like the straight line equations we all learned in high school.
If the ANN uses a simple linear activation function, it functions like finding the best-fitting straight
line for your data.
To do this, simply assume linear activation and remove all layers in the simulation except for one with just
one neuron.
This reduces the ANN to a simple straight line equation:
y = m· x + t
Where
m
corresponds to the weight of a single neuron and
t
is the bias, which gives us this single neuron network formula:
Y = W· X + B
If you keep that in mind, you might also find the answer to whether ANNs can really think or have
consciousness.
What's the use of an ANN with just one input and one output
Well noticed, my young Padawan!
In this simulation we deliberately only use a maximum of 2 feature inputs and one output, simply because we
can follow the curve fitting live in 2 or 3 dimensional space as a diagram.
In fact, you could easily expand this simulation to include as many inputs or outputs as you want. This
would not disrupt the mathematics, the learning algorithm used, or the network architecture.
This network could just as easily find feature interactions in multidimensional space, transfer vectors to
higher or lower dimensions, discover clusters, or generate probability distributions of any size.
However, while the network can handle these tasks, practical considerations such as computational resources,
training data quality, and model complexity should be taken into account. Expanding the network to handle
more inputs or outputs may increase the computational load and require careful tuning to ensure effective
learning and generalization.
Are there other methods for curve fitting
The following mathematical methods provide various approaches to fitting curves to data, each suited to
different types of relationships between variables.
Linear Regression
This method fits a straight line to a set of data points by minimizing the sum of the squares of the
vertical distances (residuals) between the data points and the line.
Polynomial Regression:
This method fits a polynomial of degree n to the data points. It generalizes linear regression to allow for
curves.
Least Squares Method
This method minimizes the sum of the squares of the residuals between the data points and the fitting
function
Spline Interpolation
This method fits piecewise polynomials (splines) to the data points, ensuring smoothness at the joins.
Exponential and Logarithmic Fitting
These methods fit exponential or logarithmic functions to the data, often used for growth or decay patterns.
Fourier Transform
Fourier Transform or FFT (Fast Fourier Transform) is
highly efficient and suitable for tasks involving periodic or frequency-based analysis, such as signal
processing, audio analysis, and image compression.
While neural networks offer a flexible and powerful tool for complex and high-dimensional data, traditional
mathematical methods remain valuable for simpler or well-defined problems.
But why do Artificial Neural Networks excel
Curve fitting with Artificial Neural Networks (ANNs) is particularly advantageous for handling high feature
sizes compared to classical mathematical methods. Here’s why:
Handling High Dimensionality
Traditional Methods: Classical regression techniques, like linear or polynomial regression, struggle
with
high-dimensional data due to the curse of dimensionality. As the number of features (inputs)
increases, the
complexity and computational cost of these methods rise significantly. ANNs: Neural networks are
designed to
handle high-dimensional data efficiently. They can manage and process a large number of input
features
through their multiple layers and neurons, making them suitable for complex, high-dimensional
datasets.
Capturing Nonlinear Relationships
Traditional Methods: Classical regression methods often assume a specific form of the relationship
between
inputs and outputs, such as linear or polynomial. These assumptions may not hold true for complex
real-world
data, leading to poor performance. ANNs: Neural networks excel at capturing nonlinear relationships
without
assuming a predetermined form. They can learn complex, hierarchical patterns in the data through
deep
architectures, making them highly flexible and powerful for modeling intricate relationships.
Feature Interactions
Traditional Methods: Accounting for interactions between features can be challenging with
traditional
methods, especially when the number of features is large. Explicitly modeling all possible
interactions can
quickly become impractical. ANNs: Neural networks implicitly learn feature interactions during
training. The
multiple layers and connections between neurons allow ANNs to automatically capture and represent
intricate
interactions between features, improving their ability to generalize from the data.
Robustness to Noisy Data
Traditional Methods: High-dimensional data often contain noisy or irrelevant features, which can
adversely
affect the performance of classical regression methods. Feature selection or dimensionality
reduction
techniques are required to mitigate this issue. ANNs: Neural networks are more robust to noisy data
due to
their capacity to learn complex patterns and filter out irrelevant features. Techniques like dropout
and
regularization help ANNs to generalize better and avoid overfitting, even in the presence of noise.
Scalability
Traditional Methods: Scaling classical regression methods to handle large datasets with high feature
sizes
can be computationally expensive and time-consuming. ANNs: Neural networks are highly scalable and
can be
trained on large datasets using modern hardware like GPUs. This scalability allows ANNs to handle
vast
amounts of data with high feature dimensions efficiently.
In summary, curve fitting with ANNs makes sense for higher feature sizes because they are designed to handle
high-dimensional data, capture nonlinear relationships, learn feature interactions, and be robust to noise.
These capabilities enable ANNs to excel in complex real-world applications where classical mathematical
methods may fall short.
Is AI always about curve fitting
No, AI is not always about curve fitting.
AI can be categorized into stateful AI, which maintains and utilizes internal state information for tasks
involving sequential dependencies, and stateless AI, which performs curve fitting to map inputs to outputs
without retaining any state between inputs.
Stateful Networks
Recurrent Neural Networks (RNNs):
Stateful by Design: RNNs maintain a hidden state that is updated at each time step as
they
process
sequential data. This hidden state carries information from previous time steps, allowing
RNNs to capture
temporal dependencies.
Sequential Data Handling: RNNs are specifically designed to handle data where the
order of
inputs matters,
such as time series, text, and speech.
Applications: RNNs are used in applications like language modeling, speech
recognition, and
time series
prediction, where understanding context or previous elements is crucial.
Energy-Based Models (EBMs):
Model Dependencies: EBMs model dependencies between variables by associating energy
values with
different
configurations of those variables. The goal is to learn an energy function that assigns
lower energy to
more
likely configurations.
Statefulness in Learning: The state in EBMs can be thought of as the configuration of
variables that the
model is trying to optimize. This state evolves during the learning process as the model
iteratively
adjusts
to minimize energy.
Applications: EBMs are used in feature learning, optimization, and data generation
tasks. Examples
include
Restricted Boltzmann Machines (RBMs) and Hopfield Networks.
Stateless ANNs
Traditional ANNs (including CNNs, Transformers, Autoencoders):
Stateless: These networks do not maintain any internal state between different
inputs.
For the same input
and given parameters, they will always produce the same output.
Feed-Forward Nature: These networks operate by feeding input data through layers of
neurons without any
feedback loops or memory mechanisms.
Applications: Stateless ANNs are used in tasks where each input is independent of
others,
such as image
classification (CNNs), sequence-to-sequence tasks without temporal dependencies
(Transformers),
and data
reconstruction (Autoencoders).
Stateful Networks (RNNs, EBMs): Maintain and use state information to capture dependencies in data. Ideal
for tasks where the order or relationship between inputs is important.
Stateless Networks (Traditional ANNs, including CNNs, Transformers, Autoencoders): Do not maintain state
information. Ideal for tasks where each input can be processed independently.
How was it built
This software was created using Vibe Coding by a Large Language Model LLM / chatbot
and reworked in look & feel.
Some features had to be implemented manually and
corrections and improvements had to be made.
The following Vibe Coding prompts were used on Copilot:
"create a single html page with tensorflow.js that implements a curve fitting example with 5 random 2d
datapoints. When training the model, use and display a progress bar on the page. display the curve on a
canvas."
"the progress bar does not show any progress and the curve is just a line. Also add a start button."
"I want the curve to hit every point. Also there is no need for the for loop around the fit function
since it has its own epoch parameter already."
"use tanh instead of relu. You need to introduce the data as training data. xs and ys are not set yet."
"why are you using 2d tensors in prepareData. The input and output data of the model is 1d is it not."
"the curve still does not hit the points."
"still not working maybe we need a batchsize equal to the number of datapoints ?."
"reduce the batchsize to 1 and normalize the datapoints to a range of 0 to 1."
"equally distribute the points along the x coordinate. I.e. have equal steps for the x values when
generating the data."
"display the loss value. Also add a graph using chart.js that show the loss over time. Also show the
data points right away. Also display the curve during training so the improvement can be seen."
"create javascript function that takes a model as input und creates the formula that computes the
output for the given input. Since the model has one input value and one output value it should be
something like y = x .... create html output as a return string. For subscript use tags for special
characters use the html character codes."
"This only creates the formula for the last layer. Change it to create the formula for all layers of an
arbitrary model with arbitrary layer counts but unit 1 input and unit 1 output"
"I want to draw the model layer on a canvas. Create a javascript function that takes a model and a div
id for the canvas as input. Let the function draw the model layers on the canvas using a circle of size
20x20 px. draw the layers in a distance of 300 px. Draw the connection between the neurons using a path
function so it shapes a nice curve from one neuron to the next."
"add the input units. also for the connections use a thickness proportional to the corresponding model
weight."
"add a function that triggers when the mouse hovers over a connection with the network layer parameter
and the weight row and column of that connection"
"create a function that draws weights and biases on another canvas with parameter canvasid and model.
Let the function draw weights in a table. Use one table for each layer. spread the layertables
vertically. let the size of the cells be configuarble."
"create a function that draws a clock like display with a given number of pointers on a given canvas.
Let each pointer display a number from 0 to 9. The inner pointer is the smallest with the first tens
place of the given number. the next pointer shows the 100 place of the number and so on."
"clear the background before drawing. Also do not label the pointers. Instead draw a ring of numbers
around each pointer. Align the numbers towards the center"
"add a mouse capture that allow the mouse to grab and drag over the clock and change its value."
"you have a redraw function which makes the code below // Draw the clock-like display redundant"
"make the mouse pointer change its shape to a drag hand."
"I do not want to grab a single pointer. Instead I want to grab the whole clock. When I move down I
want the value to decrease and accelerate the further away the mouse is when I move Up I want the value
to increase and also accelerate the further the mouse is away from the clock center"
"I want the value to accelerate and keep its speed even it the mouse does not move. Also as long as the
mouse is pressed keep changing the value even if the mouse leaves the canvas."
"create a single page html website with javascript and chart.js. draw activations functions on the
chart.js. let the user choose between the different activations functions available for AI."
"highlight the zero center vertical and horizontal lines in the chart"
"Add more function: identity, binary step, smht, gelu, softplus, elu, selu, prelu, silu, elish,
gaussian, Also add a text field the gives a description of the function and especially when it is used."