A comprehensive framework for building and optimizing complex programs and algorithms using gradient-based methods through true differentiable programming. This framework enables gradient-based optimization of arbitrary computational graphs, from simple mathematical functions to complex neural networks and dynamic systems.
The Differentiable Programming Framework provides a complete ecosystem for gradient-based optimization of computational programs. Unlike traditional deep learning frameworks that focus solely on neural networks, this framework extends automatic differentiation to arbitrary algorithms, control systems, and mathematical operations. The core innovation lies in treating entire programs as differentiable computational graphs, enabling gradient-based optimization of complex systems that were previously inaccessible to such methods.
Key motivations include enabling research in algorithm learning, optimizing complex systems with gradient information, and providing a unified framework for differentiable programming across mathematical optimization, machine learning, and control theory domains.
The framework follows a computational graph architecture where every operation builds a directed acyclic graph (DAG) of computations. The system workflow operates as follows:
Input Tensors → Computational Graph Construction → Forward Pass → Result
↓
Gradient Computation ← Backward Pass ← Loss Calculation
↓
Parameter Updates via Optimizers
The core architecture components include:
differentiable_programming/
├── core/ # Automatic differentiation engine
│ ├── tensor.py # Tensor class with computational graph
│ ├── autodiff.py # Function base class and backward pass
│ └── operations.py # Mathematical operations with gradients
├── algorithms/ # Optimization and control algorithms
│ ├── optimizers.py # SGD, Adam, RMSprop
│ ├── solvers.py # GradientDescent, NewtonMethod, ConjugateGradient
│ └── controls.py # PID, LQR controllers
├── models/ # Pre-built differentiable models
│ ├── nn.py # Neural network layers
│ └── dynamics.py # Physical system dynamics
├── utils/ # Visualization and data utilities
│ ├── visualizer.py # Computational graph visualization
│ └── data_loader.py # Data loading and batching
└── examples/ # Comprehensive usage examples
├── simple_optimization.py
├── neural_network.py
└── pendulum_control.py
- Core Computation: NumPy for numerical operations
- Automatic Differentiation: Custom reverse-mode AD implementation
- Visualization: Matplotlib, Graphviz for computational graphs
- Machine Learning: scikit-learn for data utilities
- Testing: pytest, coverage tools
- Packaging: setuptools for distribution
The framework implements reverse-mode automatic differentiation using the chain rule. For a composite function
Each operation in the computational graph implements both forward pass and gradient computation:
class Operation:
def forward(ctx, *inputs):
# Compute output
pass
def backward(ctx, grad_output):
# Compute gradients w.r.t. inputs
pass
The Adam optimizer combines momentum and adaptive learning rates:
For optimal control problems, the framework implements LQR control using the Riccati equation:
- True Differentiable Programming: Automatic differentiation for arbitrary computational graphs beyond neural networks
- Comprehensive Operation Set: Mathematical operations, linear algebra, slicing, reshaping with full gradient support
- Advanced Optimizers: SGD with momentum, Adam, RMSprop with customizable parameters
- Numerical Solvers: Gradient descent, Newton's method, conjugate gradient for system optimization
- Control Systems: PID and LQR controllers with differentiable dynamics
- Neural Network Components: Linear layers, activation functions, multi-layer perceptrons
- Physical System Modeling: Differentiable pendulum, cartpole, and drone dynamics
- Visualization Tools: Computational graph visualization and training monitoring
- Extensible Architecture: Easy addition of new operations and algorithms
- Comprehensive Examples: From simple optimization to complex control problems
Install the framework and all dependencies with these steps:
# Clone the repository
git clone https://github.com/mwasifanwar/differentiable-programming.git
cd differentiable-programming
# Create and activate a virtual environment (recommended)
python -m venv diffprog_env
source diffprog_env/bin/activate # Windows: diffprog_env\Scripts\activate
# Install core dependencies
pip install -r requirements.txt
# Install the package in development mode
pip install -e .
# Verify installation
python -c "import differentiable_programming as dp; print('Framework successfully installed!')"
For development, install additional tools:
pip install -e ".[dev]"
pip install -e ".[docs]"
import differentiable_programming as dp from dp.core.tensor import Tensorx = Tensor([2.0], requires_grad=True) y = Tensor([3.0], requires_grad=True)
z = x * y + x ** 2 w = z.log()
w.backward()
print(f"x gradient: {x.grad}") # dy/dx = y + 2x = 3 + 4 = 7 print(f"y gradient: {y.grad}") # dy/dy = x = 2
from dp.models.nn import MLP from dp.algorithms.optimizers import Adam from dp.utils.data_loader import DataLoader, Datasetmodel = MLP([10, 64, 32, 1], activation='relu') optimizer = Adam(model.parameters(), lr=0.001)
for epoch in range(100): for batch_x, batch_y in dataloader: optimizer.zero_grad()
predictions = model(batch_x) loss = ((predictions - batch_y) ** 2).mean() loss.backward() optimizer.step()
# Run all examples python main.py --example allpython main.py --example pendulum python main.py --example mnist python main.py --example rosenbrock
python examples/neural_network.py python examples/pendulum_control.py
requires_grad: Enable gradient computation (default: False)dtype: Data type for tensor operations (default: float32)
- SGD: learning_rate (0.01), momentum (0.0), weight_decay (0.0)
- Adam: learning_rate (0.001), betas ((0.9, 0.999)), eps (1e-8), weight_decay (0.0)
- RMSprop: learning_rate (0.01), alpha (0.99), eps (1e-8), momentum (0.0)
- PID Controller: kp (1.0), ki (0.0), kd (0.0), dt (0.01)
- LQR Controller: Q (state cost matrix), R (control cost matrix)
- Pendulum: mass (1.0), length (1.0), gravity (9.81), damping (0.1)
- CartPole: cart_mass (1.0), pole_mass (0.1), pole_length (1.0)
differentiable_programming/
├── core/ # Core automatic differentiation engine
│ ├── __init__.py
│ ├── tensor.py # Tensor class with computational graph
│ ├── autodiff.py # Function base class and context
│ └── operations.py # Mathematical operations with gradients
├── algorithms/ # Optimization and control algorithms
│ ├── __init__.py
│ ├── optimizers.py # SGD, Adam, RMSprop optimizers
│ ├── solvers.py # Numerical optimization solvers
│ └── controls.py # Control system algorithms
├── models/ # Differentiable models and components
│ ├── __init__.py
│ ├── nn.py # Neural network layers and architectures
│ └── dynamics.py # Physical system dynamics models
├── utils/ # Utility functions and tools
│ ├── __init__.py
│ ├── visualizer.py # Graph visualization and monitoring
│ └── data_loader.py # Data loading and preprocessing
├── examples/ # Comprehensive usage examples
│ ├── __init__.py
│ ├── simple_optimization.py # Mathematical optimization examples
│ ├── neural_network.py # ML and neural network examples
│ └── pendulum_control.py # Control system examples
├── tests/ # Unit tests and validation
│ ├── test_tensor.py
│ ├── test_operations.py
│ └── test_optimizers.py
├── requirements.txt # Python dependencies
├── setup.py # Package installation script
└── main.py # Command-line interface
The framework successfully solves complex optimization problems:
- Quadratic Optimization: Converges to global minimum in 10-50 iterations with gradient descent
- Rosenbrock Function: Handles non-convex optimization with adaptive learning rates
- Newton's Method: Achieves quadratic convergence for well-conditioned problems
On standard machine learning tasks:
- Linear Regression: Achieves near-perfect weight recovery with small datasets
- Classification: 85-95% accuracy on synthetic classification tasks with MLPs
- Training Stability: Smooth convergence with appropriate learning rate scheduling
For dynamic system control:
- Pendulum Swing-up: Successfully stabilizes inverted pendulum from random initial conditions
- CartPole Balancing: Maintains pole upright position with LQR control
- Convergence Time: Most control tasks converge within 2-5 seconds of simulated time
- Forward Pass: Comparable to NumPy operations with minimal overhead
- Backward Pass: Efficient gradient computation through computational graph traversal
- Memory Usage: Minimal overhead for gradient storage and computation
- Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2018). Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research, 18(1), 5595-5637.
- Griewank, A., & Walther, A. (2008). Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM.
- Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer Science & Business Media.
- Anderson, B. D., & Moore, J. B. (2007). Optimal control: linear quadratic methods. Courier Corporation.
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32.
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
This framework builds upon fundamental research in automatic differentiation and optimization theory. Special thanks to:
- The NumPy community for providing the foundational numerical computation library
- Researchers in automatic differentiation for establishing the mathematical foundations
- The open-source machine learning community for inspiration and best practices
- Contributors to mathematical optimization and control theory literature
M Wasif Anwar
AI/ML Engineer | Effixly AI
For questions, issues, or contributions, please open an issue or pull request on the GitHub repository.