How to Create a Node.js Proxy Server for Hosting the DeepSeek-R1 7B Model
Large language models are becoming more accessible for local deployment thanks to tools like Ollama, which offers a streamlined approach to hosting powerful AI models such as the recently released (as of January 2025) DeepSeek-R1 7B, a 7-billion-parameter LLM with comparable performance with OpenAI's o1-mini model. In this post, we'll walk through setting up a simple Node.js proxy server to host the DeepSeek-R1 7B model on an AWS EC2 instance, hiding the model from external traffic.
This setup leverages Docker, Node.js, and Ollama's model server, allowing you to keep your AI service accessible on a public subnet while isolating the model backend.
All of the code for this tutorial is available on GitHub.
Why Use a Proxy for Hosting Ollama?
Ollama operates locally by default, running models on localhost
. While this ensures security from the outside world, exposing it to external clients directly can lead to challenges:
Access Control: Opening Ollama directly to external traffic risks unauthorized access.
Extensibility: A proxy allows you to add rate-limiting, logging, or transformation layers.
Error Handling: A proxy server provides better error handling and response formatting for clients.
Prerequisites
Before we dive in, make sure you have the following ready:
AWS EC2 Instance: A
g4dn.xlarge
* or equivalent instance with at least 8GB of RAM for running the DeepSeek-R1 model.Node.js 23+ installed locally for development.
Docker installed on the EC2 instance.
Ollama installed on the EC2 instance. You can install it with:
curl -fsSL https://ollama.com/install.sh | sudo sh
*Note: The DeepSeek-R1 7B model requires at least 8GB of RAM. The tutorial uses a g4dn.xlarge AWS EC2 instance with Docker installed. A g4dn.xlarge instance costs ~$0.526 per hour, roughly $390 per month on Linux-based EC2 instances.
Step 1: Setting Up Your EC2 Instance
Launch an EC2 Instance: Use an instance type like
g4dn.xlarge
and ensure it has a public IP with a security group allowing SSH (port 22) and HTTP (port 8000) traffic. For this tutorial, we're assuming this EC2 instance is publicly accessible via a public subnet of a VPC.Install Docker: SSH into the instance and run:
sudo yum install -y docker sudo systemctl start docker sudo usermod -aG docker $USER
Install Ollama: Install Ollama with:
curl -fsSL https://ollama.com/install.sh | sudo sh
Pull the DeepSeek-R1 Model: Download the model using:
ollama pull deepseek-r1:7b
Testing Ollama: Test that Ollama is working by running:
ollama run deepseek-r1:7b
Once you're done testing, you can exit by pressing Ctrl+d
or /bye
.
Step 2: Writing the Node.js Proxy Server
The Node.js proxy server will route requests to Ollama. Here's the breakdown of the setup (avoiding TypeScript for simplicity):
index.js
: The Proxy Logic
// Import the built-in http module for creating an HTTP server
import http from 'http';
// Import node-fetch for making HTTP requests to Ollama
import fetch from 'node-fetch';
// Set default values for Ollama host and port using environment variables
const OLLAMA_HOST = process.env.OLLAMA_HOST || 'http://localhost:11434';
// Specify which model to use
const MODEL = 'deepseek-r1:7b';
// Set the port for our proxy server
const PORT = process.env.PORT || 8000;
// Function to handle streaming responses from Ollama
async function streamResponse(req, res) {
// Extract the prompt from the request body
const { prompt } = JSON.parse(req.body);
// Validate that prompt exists and is a string
if (!prompt || typeof prompt !== 'string') {
res.writeHead(400, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Invalid prompt: A non-empty string is required.' }));
return;
}
try {
// Make a POST request to Ollama's chat API
const response = await fetch(`${OLLAMA_HOST}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: MODEL, messages: [{ role: 'user', content: prompt }], stream: true }),
});
// If Ollama returns an error, forward it to the client
if (!response.ok) {
res.writeHead(response.status, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: `Error from Ollama: ${response.statusText}` }));
return;
}
// Set up streaming response to client
res.writeHead(200, { 'Content-Type': 'text/plain' });
// Stream each chunk from Ollama to the client
for await (const chunk of response.body) {
res.write(chunk);
}
res.end();
} catch (err) {
// Log and handle any errors that occur
console.error('Error:', err.message);
res.writeHead(500, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Internal server error' }));
}
}
// Create an HTTP server
const server = http.createServer((req, res) => {
// Only handle POST requests to /api/generate
if (req.method === 'POST' && req.url === '/api/generate') {
// Collect the request body data
let body = '';
req.on('data', chunk => (body += chunk));
// When all data is received, process the request
req.on('end', () => {
req.body = body;
streamResponse(req, res);
});
} else {
// Return 404 for all other routes
res.writeHead(404, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Not found' }));
}
});
// Start the server and listen on the specified port
server.listen(PORT, () => {
console.log(`Server is running on http://localhost:${PORT}`);
});
Dockerfile
: Containerizing the Server
# Use Node.js 23-alpine as the base image
FROM node:23-alpine
# Set the working directory inside the container
WORKDIR /app
# Copy package.json and install dependencies
COPY package*.json ./
RUN npm install --production
# Copy the entire project code to the container
COPY . .
# Expose port 8000 for the proxy server
EXPOSE 8000
# Start the server when the container runs
CMD ["npm", "start"]
package.json
: Dependencies
{
"name": "node-ollama-proxy",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"dependencies": {
"node-fetch": "^3.3.2"
}
}
Step 3: Deploying to EC2
Archive and Upload Locally:
tar --exclude='.git' -czvf example-DeepSeek-R1.tar.gz Dockerfile package.json index.js scp -i your-key-pair.pem example-DeepSeek-R1.tar.gz ec2-user@<EC2-PUBLIC-IP>:~
SSH into the EC2 Instance:
ssh -i your-key-pair.pem ec2-user@<EC2-PUBLIC-IP>
Extract and Build on EC2:
tar -xzvf example-DeepSeek-R1.tar.gz docker build -t node-ollama-proxy .
Run the Proxy Server: Use the host network to allow the proxy server to access the Ollama model simply:
docker run --rm --network host -e OLLAMA_HOST=http://localhost:11434 -d node-ollama-proxy
Step 4: Testing the Setup
From your local machine, send a request to the EC2 instance:
curl http://<EC2-PUBLIC-IP>:8000/api/generate \
-H 'Content-Type: application/json' \
-d '{ "prompt": "What is 1 + 1?" }'
The server will forward the request to Ollama, process it using the DeepSeek-R1 model, and stream the response back.
If everything is working correctly, you should see a response like:
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.050479Z","message":{"role":"assistant","content":"boxed"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.063721Z","message":{"role":"assistant","content":"{"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.076935Z","message":{"role":"assistant","content":"2"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.089978Z","message":{"role":"assistant","content":"}\n"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.103315Z","message":{"role":"assistant","content":"\\"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.116519Z","message":{"role":"assistant","content":"]"},"done":false}
{"model":"deepseek-r1:7b","created_at":"2025-01-24T16:40:46.129801Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":1323640041,"load_duration":26338291,"prompt_eval_count":11,"prompt_eval_duration":211000000,"eval_count":83,"eval_duration":1084000000}
Where the done_reason
is stop
, indicating the model has finished processing the request.
Limitations
This example was designed purely for demonstration purposes and is not production-grade. However, with additional modifications, it could be adapted for more robust small-to-medium-scale deployments. If the hosting machine is sufficiently powerful, this setup can potentially support a few hundred users on a regular monthly basis.
During testing of the DeepSeek-R1 7B model, it became apparent that the model includes some degree of censorship, as evidenced by responses to questions about Taiwan—an independent country—where it aligns with narratives that it is part of China. This suggests the training data includes influence from censorship policies.
For tasks that do not involve topics sensitive to such censorship concerns, like simple reasoning or general-purpose queries, the model demonstrates utility and could be a more cost-efficient alternative to hosted LLM providers.
For production scenarios, consider adding:
Authentication to secure the proxy server.
Rate-limiting to mitigate abuse and manage resource allocation.
Integration with CI/CD pipelines for automated updates.
Monitoring tools to track server health and model performance.
Robust error handling for improved reliability.
Service management to configure Ollama to run as a service on the EC2 instance, ensuring it can automatically restart if it crashes.
Conclusion
This guide has shown you how to easily deploy a proxy server to host the DeepSeek-R1 7B LLM using Ollama and Docker. By utilizing the host network, you ensure local access to the model while only exposing the Node.js proxy to external clients.
Following these steps, you can now efficiently experiment with advanced AI models for small to medium scale applications.
Questions? Comments? Contact me