I’ve been running AI models locally for a few months now, and it’s become one of the more useful things I’ve added to my projects. The main reason is simple: I don’t want to send sensitive or personal data to external servers if I don’t have to. Running models locally also removes delays and gives me more control over how things work.
One of the first times I tried this was when I wanted to predict energy usage patterns based on past data. Instead of sending everything to a cloud service, I trained a small model and ran it directly on my homelab. It worked well enough for my needs and didn’t require any internet connection after the model was set up.
The biggest practical step was converting the model into ONNX format and running it with ONNX Runtime. This made it much easier to run on different devices without depending on a specific machine learning framework. Here’s a basic example of how I load and use the model:
import onnxruntime as ort
import numpy as np
# Load the model
session = ort.InferenceSession("energy_model.onnx")
# Prepare input data
input_data = np.array([[12.5, 8.2, 15.3, 9.8, 14.1]], dtype=np.float32)
# Run inference
outputs = session.run(None, {"input": input_data})
predicted = outputs[0].flatten()[0]
print(f"Predicted usage: {predicted:.2f}")
This approach is straightforward. Once the model is converted to ONNX, it runs efficiently even on smaller devices. I didn’t need to install heavy frameworks on every machine, which made deployment simpler. To make the model easier to use as a service, I also wrapped it in a small API using Flask and put everything inside a Docker container. This way, I could run the same setup on different machines without worrying about dependency issues. Here’s a simplified version of what that looked like:
from flask import Flask, request, jsonify
import onnxruntime as ort
import numpy as np
app = Flask(__name__)
session = ort.InferenceSession("energy_model.onnx")
@app.route("/predict", methods=["POST"])
def predict():
data = request.json
input_data = np.array([data["usage"]], dtype=np.float32)
outputs = session.run(None, {"input": input_data})
return jsonify({"predicted_usage": float(outputs[0].flatten()[0])})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
Using Docker made it easy to move the service around and keep the environment consistent. It also made testing and updating the model simpler over time.
What I’ve found is that local AI works best when you keep the scope small. Instead of trying to run large, general-purpose models, focusing on one specific task usually gives better results with less effort. It also helps to start with a model that’s already been trained and then optimize it for local use, rather than training everything from scratch on limited hardware.
Running AI locally isn’t always faster or more accurate than cloud solutions, but it gives you more control and removes the need to send data outside your own devices. For many personal or internal projects, that trade-off is worth it.