Neural Network Cost Explorer

Use this page to see what a neural network is made of. Change the number of inputs, hidden layers, and outputs, then watch how the parameter count changes. The page connects those parameters back to the familiar maths idea y = ax + b, where weights are like a and biases are like b. The examples below show how a tiny classifier calculates a result, and how a larger network can recognise simple letters.

Network shape Inputs, hidden layers, and outputs determine how many connections the model has.
Parameter cost Each weight and bias must be stored, trained, and used whenever the model runs.
Maths connection Neural networks reuse the idea of y = ax + b many times across many neurons.
Learning example The letter demo shows how pixels become signals, scores, and a final prediction.

Change the network structure

Type layer sizes separated by commas. Example: 8, 16, 8
Cost message Your current network is small.
Loading...
  • Wider layers add many more connections.
  • More hidden layers increase depth.
  • Both usually increase training and inference cost.
  • For image or letter recognition, the input layer often grows quickly because each pixel can become an input value.

Cost summary

Layer sizes
-
Total parameters
-
Approx memory (FP32)
-
Relative compute cost
-
Depth
-
Simple size label
-
Parameter = weight + bias Weight is like a Bias is like b More parameters = more cost Deeper network = more layers

Parameters and y = ax + b

In traditional maths, a straight line is often written as y = ax + b. The a controls how strongly the input x changes the output y, and the b shifts the result up or down.

A neural network neuron uses the same idea, just repeated many times: each connection has a weight, and each neuron has a bias. During training, the network searches for useful values for those weights and biases.

Traditional: y = a*x + b One neuron: output = activation(w*x + b) Many inputs: output = activation(w1*x1 + w2*x2 + ... + b)

Network picture

Lines represent weighted connections. Nodes represent units. The picture is capped visually so large networks still stay easy to view.

Small deep learning example: letter recognition

Dark cell = pixel turned on
Every pixel is connected to the next layer
Blue glow = stronger active signal
Predicted letter
1. Input image A small grid stores the letter as pixel values.
2. Hidden layers Every input pixel connects to the next layer, even white pixels.
3. Output scores Each output node represents one possible letter.
4. Prediction The highest score becomes the recognised letter.
Example shown: the network reads a small pixel grid and predicts which letter it is.

Smallest useful network example

The smallest useful neural network can be just one input and one output neuron. It is not deep yet, but it uses the same parameter idea as larger networks: one weight and one bias. Here it predicts whether a student is likely to pass from the number of study hours.

Input: x = study hours Parameters learned by training: weight w = 2 bias b = -5 Neuron calculation: z = w*x + b score = sigmoid(z) Decision rule: if score >= 0.5, predict Pass otherwise, predict Not pass What sigmoid means: sigmoid(z) = 1 / (1 + e^(-z)) It squeezes any number into a score between 0 and 1. Example 1: x = 3 hours z = 2*3 - 5 = 1 score = sigmoid(1) score = 1 / (1 + e^(-1)) score = 1 / (1 + 0.37) = 0.73 result = Pass Example 2: x = 2 hours z = 2*2 - 5 = -1 score = sigmoid(-1) score = 1 / (1 + e^(1)) score = 1 / (1 + 2.72) = 0.27 result = Not pass Parameter count: 1 weight + 1 bias = 2 parameters
Study hours x z = 2*x - 5 sigmoid(z) Prediction
3 1 0.73 Pass
2 -1 0.27 Not pass
This is the same shape as y = ax + b: the weight is like a, the bias is like b, and sigmoid turns the raw result z into a probability-like score between 0 and 1.