Mahdi Shadi

Deploying Jitsi Meet Behind Traefik with Docker Compose (Custom SSL PEM/KEY + Host Authentication)

Mahdi Shadi — Mon, 16 Feb 2026 07:54:08 GMT

This guide walks through how we deployed Jitsi Meet using Docker Compose, placed it behind Traefik, terminated TLS using our own certificate (.pem + .key) via Traefik’s File Provider, and enabled Host Authentication using internal auth (Prosody users).

1) Target Setup and Architecture

Goals:

Traefik is already listening on 443 (used by other services).
Jitsi must run behind Traefik and not take over port 443 directly.
TLS must be served using a custom certificate (not Let’s Encrypt).
Enable Host Authentication (internal auth), so only authenticated users can host/start meetings (and optionally allow guests to join).

Traffic flow:

Internet (https://meet.shonizcloud.ir)
          |
          v
     Traefik :443    (TLS termination using custom cert)
          |
          v
    Jitsi Web :80     (inside Docker network)
          |
          +--> Prosody / Jicofo / JVB ...

2) Get the Official Jitsi Docker Compose Files (Release ZIP)

We keep all Docker-based apps under a single parent folder so backup is easy (zip the parent folder).

Create the folder structure:

mkdir -p docker/jitsi
cd docker/jitsi

Download the latest official docker-jitsi-meet release ZIP:

wget $(curl -s https://api.github.com/repos/jitsi/docker-jitsi-meet/releases/latest | grep 'zip' | cut -d\" -f4)

Install unzip:

Ubuntu / Debian

sudo apt install unzip

RedHat / CentOS / Fedora

sudo dnf install unzip

Unzip the downloaded release (replace with the actual filename you downloaded):

unzip ./stable-

Rename the extracted folder to something clean:

mv jitsi-docker-jitsi-meet- jitsi-meet
cd jitsi-meet

3) Copy Environment and Compose Files

Create your working .env from the example:

cp env.example .env

Copy the compose file to the newer naming convention (compose.yaml) and back up the original:

cp docker-compose.yml compose.yaml
mv docker-compose.yml docker-compose.yml.bak

4) Edit `.env` (Config Path, Domain, IPs, and Internal Authentication)

Open .env:

nano .env

4.1 Put the config volume inside the project directory

This keeps all persistent configuration under the project folder (easy backups):

CONFIG=./.jitsi-meet-cfg

4.2 Ports

If the default ports (8000, 8443) are free, you can keep them. If other apps use them, change them here and remember the values for your reverse proxy design.

4.3 Timezone

Set the correct timezone for your server, for example:

TZ=Europe/Berlin

4.4 PUBLIC_URL

This is very important. It must match the external URL users will browse to:

PUBLIC_URL=https://meet.shonizcloud.ir

4.5 JVB_ADVERTISE_IPS

For NAT or multi-path access scenarios, you can set:

JVB_ADVERTISE_IPS=192.168.21.14,209.41.5.158

Very important note:
The public IP you put here must be the exact public IP address that external users see and connect to (i.e., the Internet/WAN-facing IP).
If your server is behind NAT/firewall, this should not be your internal/private IP—use the public (NATed) IP that inbound traffic actually reaches. Optionally, you can include your private LAN IP first, then the public IP.

4.6 Enable Internal Authentication (Host Authentication)

In the Authentication section, enable:

ENABLE_AUTH=1
ENABLE_GUESTS=1
AUTH_TYPE=internal

With ENABLE_GUESTS=1, unauthenticated users can join once a host starts the meeting.
If you don’t want guests at all, do not enable ENABLE_GUESTS.

4.7 Generate Strong Internal Passwords

Exit the editor and run:

./gen-passwords.sh

Re-open .env and verify the Security section got populated with strong random passwords.

4.8 Enable Restart Policy

Set:

RESTART_POLICY=unless-stopped

5) Pull Images and Start Jitsi

Pull all images:

docker compose pull

Start the stack and follow logs:

docker compose up -d && docker compose logs -f

6) Create Prosody Users for Authentication (Recommended: Use Default `meet.jitsi`)

If you try to create a Prosody user and see:

The given hostname does not exist in the config

it means the domain you used is not a configured VirtualHost in Prosody (or you are pointing prosodyctl at the wrong config).

To see the configured VirtualHosts:

docker compose exec prosody sh -lc "grep -RIn 'VirtualHost' /config | head -n 50"

In docker-jitsi-meet, VirtualHosts are commonly defined here:

/config/conf.d/jitsi-meet.cfg.lua

Recommendation: To avoid unnecessary complexity with XMPP domain changes, create users using the default internal XMPP domain meet.jitsi, even if your external web domain is meet.shonizcloud.ir.

Create a user like this:

docker compose exec prosody prosodyctl --config /config/conf.d/jitsi-meet.cfg.lua \
  register mahdi meet.jitsi mahdi123

7) Put Jitsi Behind Traefik

In this design, Traefik handles HTTPS and forwards traffic internally to Jitsi over HTTP (port 80 inside the Docker network).

8) Full Traefik Configuration for Custom SSL (PEM/KEY) Using File Provider

We disabled Let’s Encrypt for this site and used our own certificate.

8.1 Traefik Docker Compose (full config used)

services:
  traefik:
    image: "docker-mirror.kubarcloud.com/traefik"
    restart: always
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.file.filename=/dynamic/tls.yml"
      - "--providers.file.watch=true"
      - "--entrypoints.web.address=:80"
     # - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
     # - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
    #  - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"
    #  - "--certificatesresolvers.mytlschallenge.acme.email=mahdishadi99@gmail.com"
    #  - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    networks:
      - radar
    volumes:
      - traefik_data:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /root/Dashboard-Internet/dynamic:/dynamic:ro
      - /root/Jitsi-meet/keys:/etc/traefik/certs:ro

Notes:

--providers.file.filename=/dynamic/tls.yml tells Traefik to load TLS certs from this file.
We mount the cert directory so Traefik can read the PEM/KEY inside the container:
- /root/Jitsi-meet/keys → /etc/traefik/certs

8.2 TLS config file (`tls.yml`)

Path on the host:

/root/Dashboard-Internet/dynamic/tls.yml

Content:

tls:
  certificates:
    - certFile: /etc/traefik/certs/meet.shonizcloud.ir.pem
      keyFile: /etc/traefik/certs/meet.shonizcloud.ir.key

This means the following files must exist on the host:

/root/Jitsi-meet/keys/meet.shonizcloud.ir.pem
/root/Jitsi-meet/keys/meet.shonizcloud.ir.key

9) Jitsi Labels for Traefik (HTTPS + Redirect + Security Headers)

Apply these labels to the Jitsi web service to route meet.shonizcloud.ir through Traefik:

labels:
  - traefik.enable=true
  - traefik.http.routers.meet.rule=Host(`meet.shonizcloud.ir`)
  - traefik.http.routers.meet.entrypoints=websecure
  - traefik.http.routers.meet.tls=true
  - traefik.http.routers.meet.middlewares=meet-headers@docker
  - traefik.http.services.meet-svc.loadbalancer.server.port=80

  - traefik.http.middlewares.meet-headers.headers.STSSeconds=315360000
  - traefik.http.middlewares.meet-headers.headers.forceSTSHeader=true
  - traefik.http.middlewares.meet-headers.headers.STSIncludeSubdomains=true
  - traefik.http.middlewares.meet-headers.headers.STSPreload=true
  - traefik.http.middlewares.meet-headers.headers.browserXSSFilter=true
  - traefik.http.middlewares.meet-headers.headers.contentTypeNosniff=true

  - traefik.http.routers.meet-http.rule=Host(`meet.shonizcloud.ir`)
  - traefik.http.routers.meet-http.entrypoints=web
  - traefik.http.routers.meet-http.middlewares=meet-redirect@docker
  - traefik.http.middlewares.meet-redirect.redirectscheme.scheme=https

Key points:

websecure router serves HTTPS.
Separate web router redirects HTTP → HTTPS.
loadbalancer.server.port=80 tells Traefik to forward to the Jitsi web container on port 80.

10) Verify the Certificate Served by Traefik

To confirm Traefik is presenting your custom certificate:

echo | openssl s_client -connect meet.shonizcloud.ir:443 -servername meet.shonizcloud.ir 2>/dev/null \
  | openssl x509 -noout -subject -issuer -dates

11) Summary

What we achieved:

Downloaded and deployed docker-jitsi-meet from the latest official release ZIP
Organized the project for easy backup (docker/jitsi/jitsi-meet)
Configured .env (including internal auth + generated strong passwords)
Started the Jitsi stack using Docker Compose
Recommended creating Prosody users on the default XMPP domain meet.jitsi
Configured Traefik to terminate TLS using a custom PEM/KEY certificate via File Provider
Added complete Traefik labels to route Jitsi over HTTPS and redirect HTTP to HTTPS

When Logs Weren’t Enough: Setting Up Prometheus Behind Traefik (TLS)

Mahdi Shadi — Fri, 30 Jan 2026 12:31:57 GMT

At some point, logs alone stop being enough. You start seeing short spikes in CPU, random latency jumps, or brief outages that don’t leave a clear trace in the logs. That’s where real monitoring pays off: numbers, charts, history, and alerts.

This guide walks through a complete, practical setup:

Run Prometheus with Docker Compose
Put Prometheus behind Traefik with TLS
Fix a common UI issue in newer Prometheus releases
Install Node Exporter on the host using systemd
Connect containerized Prometheus to host-installed Node Exporter
Put Node Exporter behind Traefik as well, so it’s reachable via HTTPS (secured with an IP whitelist)

What is Prometheus?

Prometheus is a time-series monitoring system. Exporters and services usually expose metrics on an HTTP endpoint such as /metrics. Prometheus periodically scrapes these endpoints, stores the data, and lets you query it using PromQL (and later build dashboards/alerts with Grafana/Alertmanager).

Prerequisites

Docker + Docker Compose installed on your server
Traefik already running and capable of issuing TLS certificates (e.g., certresolver=mytlschallenge)
A Docker external network named traefik
DNS records in place for:
- prometheus.mahdishadi.me
- nodeexporter.mahdishadi.me

Part 1: Run Prometheus behind Traefik with TLS

1) Create a project directory

mkdir -p /opt/monitoring/prometheus
cd /opt/monitoring/prometheus

2) Create `prometheus.yml`

Start simple: scrape Prometheus itself.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["prometheus:9090"]

3) Create `docker-compose.yml` (Prometheus behind Traefik)

version: "3.8"

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus

    # Useful for debugging. If you want Traefik-only access, you can remove this.
    ports:
      - "9090:9090"

    labels:
      - traefik.enable=true
      - traefik.http.routers.prometheus.rule=Host(`prometheus.mahdishadi.me`)
      - traefik.http.routers.prometheus.entrypoints=web,websecure
      - traefik.http.routers.prometheus.tls=true
      - traefik.http.routers.prometheus.tls.certresolver=mytlschallenge

      # Optional security headers
      - traefik.http.middlewares.prometheus.headers.SSLRedirect=true
      - traefik.http.middlewares.prometheus.headers.STSSeconds=315360000
      - traefik.http.middlewares.prometheus.headers.browserXSSFilter=true
      - traefik.http.middlewares.prometheus.headers.contentTypeNosniff=true
      - traefik.http.middlewares.prometheus.headers.forceSTSHeader=true
      - traefik.http.middlewares.prometheus.headers.STSIncludeSubdomains=true
      - traefik.http.middlewares.prometheus.headers.STSPreload=true
      - traefik.http.routers.prometheus.middlewares=prometheus@docker

    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus

    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      # Helps if the new UI loads blank/spins on /query in some setups
      - "--enable-feature=old-ui"

    restart: unless-stopped
    networks:
      - traefik

volumes:
  prometheus-data:

networks:
  traefik:
    external: true

4) Start Prometheus

docker compose up -d

Access:

https://prometheus.mahdishadi.me

Part 2: If the UI is blank, verify Prometheus is healthy

If the UI is stuck/blank, check the server directly via API endpoints:

curl -sS http://127.0.0.1:9090/-/ready
curl -sS http://127.0.0.1:9090/api/v1/status/buildinfo | head
curl -sS 'http://127.0.0.1:9090/api/v1/query?query=up'

If you get “Ready” and the up query returns a result, Prometheus is healthy and it’s likely a frontend/UI issue.
Enabling --enable-feature=old-ui (already included above) typically fixes this scenario.

Part 3: Install Node Exporter on the host (systemd)

Node Exporter provides host-level metrics: CPU, memory, disk, network, load, etc.

1) Create a dedicated user

sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter

2) Check system architecture

uname -m

x86_64 → linux-amd64
aarch64 → linux-arm64

3) Download and install the binary (example: amd64)

cd /tmp
VER="1.8.1"

wget https://github.com/prometheus/node_exporter/releases/download/v${VER}/node_exporter-${VER}.linux-amd64.tar.gz
tar xvf node_exporter-${VER}.linux-amd64.tar.gz

sudo mv node_exporter-${VER}.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
sudo chmod +x /usr/local/bin/node_exporter

(If you’re on ARM64, download node_exporter-${VER}.linux-arm64.tar.gz instead.)

4) Create a systemd service

sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<'EOF'
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=0.0.0.0:9100
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

5) Enable and start the service

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter --no-pager

6) Test it locally

curl -sS http://127.0.0.1:9100/metrics | head

Part 4: Connect containerized Prometheus to host-installed Node Exporter

On Linux, host.docker.internal is not always available by default inside containers. The clean solution is to map it using Docker’s host-gateway.

1) Add `extra_hosts` to Prometheus in `docker-compose.yml`

extra_hosts:
  - "host.docker.internal:host-gateway"

Example (relevant section):

services:
  prometheus:
    ...
    extra_hosts:
      - "host.docker.internal:host-gateway"

Then recreate:

docker compose up -d --force-recreate

2) Add the Node Exporter job to `prometheus.yml`

  - job_name: "Mahdi Host"
    static_configs:
      - targets: ["host.docker.internal:9100"]

Final prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["prometheus:9090"]

  - job_name: "Mahdi Host"
    static_configs:
      - targets: ["host.docker.internal:9100"]

Recreate Prometheus:

docker compose up -d --force-recreate

In Prometheus UI:

Status → Targets
The Mahdi Host job should be UP

Part 5: Put Node Exporter behind Traefik with HTTPS

Node Exporter does not provide TLS by itself. To expose it via HTTPS, put Traefik in front of it and let Traefik terminate TLS.

1) Ensure Traefik can reach the host

In Traefik’s own compose file (the one that runs Traefik), add:

services:
  traefik:
    extra_hosts:
      - "host.docker.internal:host-gateway"

2) Enable Traefik File Provider

This is the clean approach for services that are not Docker containers.

In Traefik’s compose:

services:
  traefik:
    volumes:
      - /opt/traefik/dynamic:/etc/traefik/dynamic
    command:
      - "--providers.file.directory=/etc/traefik/dynamic"
      - "--providers.file.watch=true"

Create the directory:

sudo mkdir -p /opt/traefik/dynamic

3) Create a dynamic config for Node Exporter (TLS + IP whitelist)

Create:

/opt/traefik/dynamic/nodeexporter.yml

http:
  routers:
    nodeexporter:
      rule: Host(`nodeexporter.mahdishadi.me`)
      entryPoints:
        - websecure
      tls:
        certResolver: mytlschallenge
      service: nodeexporter
      middlewares:
        - nodeexp-ipwhitelist

  services:
    nodeexporter:
      loadBalancer:
        servers:
          - url: "http://host.docker.internal:9100"

  middlewares:
    nodeexp-ipwhitelist:
      ipWhiteList:
        sourceRange:
          - "YOUR_PUBLIC_IP/32"

Get your public IP:

curl -s https://ifconfig.me

Replace YOUR_PUBLIC_IP/32 with the output, e.g. 1.2.3.4/32.

Recreate Traefik:

docker compose up -d --force-recreate

Access (Node Exporter’s main endpoint):

https://nodeexporter.mahdishadi.me/metrics

Node Exporter serves metrics on /metrics. Hitting / may not show useful output.

Part 6: Dealing with `403 Forbidden` on Node Exporter HTTPS

A 403 here is usually caused by the IP whitelist:

Your current IP is not included in sourceRange
You are on VPN (your public IP changed)

Update sourceRange with your actual public IP and reload/recreate Traefik.

Quick verification commands

Node Exporter directly on the host

curl -sS http://127.0.0.1:9100/metrics | head

Node Exporter via Traefik HTTPS

curl -sS https://nodeexporter.mahdishadi.me/metrics | head

Prometheus container scraping the host

docker exec -it prometheus sh -c 'wget -qO- http://host.docker.internal:9100/metrics | head'

Security note (important)

Node Exporter has no authentication. If exposed to the internet, always protect it:

IP whitelist (as shown), and/or
Basic auth, and/or
keep it internal-only and let Prometheus scrape it privately

Prometheus scraping does not require HTTPS; HTTPS is mainly for safe human/browser access.

How I Put n8n Behind Traefik with Automatic HTTPS: A Real-World Traefik + n8n Setup

Mahdi Shadi — Tue, 02 Dec 2025 06:10:53 GMT

This all started with a pretty simple goal:

“Run n8n on my server and expose it cleanly at
https://n-test.mahdishadi.me.”

I could have just exposed n8n’s port directly on the server and called it a day.
But I wanted something more:

Clean domain-based routing
Proper HTTPS with Let’s Encrypt
A setup that scales nicely when I add more services

So I went for a cloud-native approach with Traefik in front and n8n behind it in Docker.

Of course, it didn’t “just work.” There were 404s, 308s, ACME errors, missing volumes, and plenty of docker exec + curl.
Here’s the full story of what I did, why I did it, and how I debugged the problems along the way.

Why Traefik? And Why Cloud-Native Matters Here

What is Traefik?

Traefik is an edge router / reverse proxy built for the cloud-native world.

Instead of manually editing a huge config file and defining all your backends, Traefik:

Discovers services automatically (via Docker, Kubernetes, Consul, etc.)
Configures routes based on labels or Ingress definitions
Handles Let’s Encrypt and automatic certificate renewal
Offers middlewares (redirects, auth, rate limiting, etc.)
Ships with a web dashboard and metrics integration

Why use it instead of just exposing n8n directly?

When you have more than one service, the “just expose a random port” approach gets ugly fast:

You end up with URLs like server-ip:5678, server-ip:9000, etc.
Managing HTTPS per service becomes painful.
Security and organization become a mess.

With Traefik:

You expose only Traefik to the internet (ports 80/443).
All internal services (like n8n) live on a private Docker network.
You route based on hostname (e.g. n-test.mahdishadi.me → n8n).
SSL is handled centrally, automatically.

What does “cloud-native” buy us here?

Dynamic service discovery
New containers with the correct labels are picked up automatically—no manual configuration reloads are required.
Configuration via labels
Routing logic lives alongside the service definition (in docker-compose.yml), not in a giant central config file.
Automatic TLS
Traefik talks to Let’s Encrypt via ACME, requests and renews certs, and stores them in a file you mount.
Better security & separation
Only Traefik is exposed. n8n is reachable only over an internal Docker network.

The Final Architecture

Here’s the architecture I aimed for:

Traefik:
- Listens on ports 80 and 443 for HTTP/HTTPS
- Has a dashboard on port 8080
- Talks to Docker and automatically discovers services
- Gets Let’s Encrypt certs and stores them in acme.json
n8n :
- Runs in Docker on a private network frontend
- Does not expose its port directly to the host
- Is accessible only via Traefik at https://n-test.mahdishadi.me
Docker network:
- A shared network named frontend that both Traefik and n8n join

Traefik Setup: EntryPoints, Docker Provider, and Let’s Encrypt

docker-compose for Traefik

services:
  traefik:
    image: traefik:v3.6
    container_name: traefik-demo
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./config/traefik.yaml:/etc/traefik/traefik.yaml:ro
      - ./letsencrypt:/letsencrypt
    networks:
      - frontend
    restart: unless-stopped

networks:
  frontend:
    external: true

Key ideas:

The Docker socket lets Traefik read labels on your containers.
The config file is mounted as traefik.yaml.
The ./letsencrypt directory is mounted into the container as /letsencrypt for cert storage.
Traefik is on the frontend network so it can reach n8n.

`traefik.yaml` configuration

global:
  checkNewVersion: false
  sendAnonymousUsage: false

log:
  level: DEBUG

api:
  dashboard: true
  insecure: true

entryPoints:
  web:
    address: ":80"
    http:
      redirections:             # redirect HTTP -> HTTPS
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"

providers:
  docker:
    endpoint: "unix://var/run/docker.sock"
    exposedByDefault: false     
    network: frontend          

certificatesResolvers:
  letsencrypt:
    acme:
      email: "mahdishadi99@gmail.com"
      storage: "/letsencrypt/acme.json"
      httpChallenge:
        entryPoint: "web"

Why this config?

exposedByDefault: false
Traefik does not automatically publish every container. Only containers with traefik.enable=true become reachable. This is safer and more explicit.
network: frontend
Traefik will use the frontend Docker network when talking to containers. n8n is on the same network, so they can talk.
entryPoints.web → websecure redirect
Everything on port 80 gets redirected to HTTPS on port 443. All public traffic ends up encrypted.
certificatesResolvers.letsencrypt with httpChallenge
Traefik uses the HTTP-01 challenge on the entrypoint web (port 80) to obtain certificates and store them at /letsencrypt/acme.json.

On the host side, I created the ACME storage file:

mkdir -p letsencrypt
touch letsencrypt/acme.json
chmod 600 letsencrypt/acme.json

n8n Setup: Labels and Environment

Here’s the n8n stack I ended up with:

version: "3.8"

networks:
  frontend:
    external: true  

services:
  n8n:
    image: n8nio/n8n:latest
    container_name: n8n
    restart: unless-stopped
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.N8N.rule=Host(`n-test.mahdishadi.me`)"
      - "traefik.http.routers.N8N.entrypoints=websecure"
      - "traefik.http.routers.N8N.tls.certresolver=letsencrypt"

      - "traefik.http.routers.N8N-http.rule=Host(`n-test.mahdishadi.me`)"
      - "traefik.http.routers.N8N-http.entrypoints=web"
      - "traefik.http.routers.N8N-http.middlewares=N8N-https-redirect"
      - "traefik.http.middlewares.N8N-https-redirect.redirectscheme.scheme=https"

    environment:
      - N8N_HOST=n-test.mahdishadi.me
      - N8N_PORT=5678
      - N8N_PROTOCOL=https
      - NODE_ENV=production
      - WEBHOOK_URL=https://n-test.mahdishadi.me
      - GENERIC_TIMEZONE=Asia/Tehran
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=********
      - N8N_USER_MANAGEMENT_DISABLED=true
      - N8N_AUTH_DISABLE=true
      - N8N_DIAGNOSTICS_ENABLED=false

    volumes:
      - n8ntest_data:/home/node/.n8n
      - ./local-files:/files
    networks:
      - frontend

volumes:
  n8ntest_data:
    external: true

Why these Traefik labels?

traefik.enable=true
Required because exposedByDefault: false. Without this, Traefik ignores the container completely.

Main HTTPS router:

  - "traefik.http.routers.N8N.rule=Host(`n-test.mahdishadi.me`)"
  - "traefik.http.routers.N8N.entrypoints=websecure"
  - "traefik.http.routers.N8N.tls.certresolver=letsencrypt"

Matches only requests with Host: n-test.mahdishadi.me.
Uses the websecure entrypoint (port 443).
Uses the letsencrypt resolver defined in traefik.yaml to get the certificate.

HTTP router + redirect middleware:

  - "traefik.http.routers.N8N-http.rule=Host(`n-test.mahdishadi.me`)"
  - "traefik.http.routers.N8N-http.entrypoints=web"
  - "traefik.http.routers.N8N-http.middlewares=N8N-https-redirect"
  - "traefik.http.middlewares.N8N-https-redirect.redirectscheme.scheme=https"

This router listens on web (port 80) and applies the N8N-https-redirect middleware, which simply changes the scheme to https. That means:

http://n-test.mahdishadi.me → 301/308 → https://n-test.mahdishadi.me.

Why these n8n environment variables?

N8N_HOST, N8N_PORT, N8N_PROTOCOL
n8n must know how it is accessed from the outside so it can generate correct URLs.
WEBHOOK_URL
Used for webhook URLs n8n exposes; it must match the external HTTPS URL.
GENERIC_TIMEZONE=Asia/Tehran
So scheduled workflows match your local time.
N8N_BASIC_AUTH_*
Adds an extra Basic Auth layer in front of the n8n UI. (For a public internet-facing instance, this is very helpful.)
N8N_USER_MANAGEMENT_DISABLED, N8N_AUTH_DISABLE
Disables n8n’s internal user management/auth, relying instead on the Basic Auth above for this scenario.
N8N_DIAGNOSTICS_ENABLED=false
Disables telemetry/diagnostics.

The Problems I Hit (and How They Were Fixed)

Problem 1: 404 from Traefik – No Router Matched

One of the first tests I ran was:

curl -v http://91.107.179.182/ -H 'Host: n-test.mahdishadi.me'

And I got:

HTTP/1.1 404 Not Found
404 page not found

This means:

The request did reach Traefik (good).
But Traefik didn’t find any router that matched Host("n-test.mahdishadi.me") on that entrypoint (bad).

Possible causes I checked:

Wrong or missing labels
Docker provider is not working correctly
Service isn’t on the same network as Traefik

By fixing the Docker provider config (exposedByDefault, network: frontend) and ensuring both Traefik and n8n were attached to the frontend network, Traefik started picking up the routers correctly.

Problem 2: HTTP → HTTPS Worked (308), but HTTPS Still Failed

After adding the HTTP router and redirect middleware, I tested:

curl -I http://n-test.mahdishadi.me

Result:

HTTP/1.1 308 Permanent Redirect
Location: https://n-test.mahdishadi.me/

So the HTTP side looked great:

N8N-http router was active.
Middleware N8N-https-redirect did its job.

But the next step was making sure HTTPS itself worked with a valid certificate. That’s where Let’s Encrypt and ACME came in.

Problem 3: Let’s Encrypt / ACME Errors – “HTTP challenge not enabled” / Resolver Skipped

I configured ACME in traefik.yaml and created letsencrypt/acme.json with permissions 600.
Everything looked correct… but Traefik logs / dashboard complained:

The HTTP challenge wasn’t enabled, or
The letsencrypt Resolver was being skipped.

So I jumped into the container to see if the ACME storage path really existed:

docker exec -it traefik-demo sh
/ # ls /letsencrypt
ls: can't access '/letsencrypt': No such file or directory

Boom. There it was.

Root cause: the volume wasn’t mounted (old container, new compose)

Yes, my docker-compose.yml had:

volumes:
  - ./letsencrypt:/letsencrypt

But I had not recreated the container after adding that. I had only restarted it.
Docker will not magically pick up new volumes on an existing container; you must recreate it.

The fix:

docker compose down
docker compose up -d

Then:

docker exec -it traefik-demo sh
/ # ls /letsencrypt
acme.json
/ # ls -l /letsencrypt/acme.json
-rw------- 1 root root 0 Nov 28 05:51 acme.json

Now Traefik could actually access /letsencrypt/acme.json, and the ACME resolver started working properly. Once the certificate was obtained, acme.json grew from 0 bytes to a real JSON file with cert data.

At that point, hitting:

curl -vk https://n-test.mahdishadi.me/

returned the n8n HTML with a valid Let’s Encrypt certificate, and browsers showed a secure connection.

Debugging Traefik + n8n: A Practical Checklist

This journey turned into a pretty solid debugging checklist:

1. Always start with `curl` + `Host` header

curl -v http://SERVER_IP/ -H 'Host: n-test.mahdishadi.me'
curl -vk https://n-test.mahdishadi.me/ -H 'Host: n-test.mahdishadi.me'

What errors mean:

404 → No matching Traefik router (rule/labels/entrypoint issue).
308 → HTTP redirect is working.
502 → Router exists, but backend (n8n) is unreachable (network/port/container down).

2. Use the Traefik dashboard

Visit:
http://SERVER_IP:8080/dashboard/

Check:

Routers:
- Do you see N8N and N8N-http?
- Do they have Rule = Host("n-test.mahdishadi.me")?
- Are the entrypoints web / websecure as expected?
Services:
- Is there a service backing N8N?
- Is it green (healthy) or red (error)?
- What internal IP/port is Traefik using?

3. Verify Docker networking

docker network inspect frontend
docker inspect traefik-demo | grep -A3 '"frontend"'
docker inspect n8n | grep -A3 '"frontend"'

Both containers must be attached to the same network (frontend) or Traefik can’t reach n8n.

4. Double-check ACME / Let’s Encrypt

On the host:

ls -l letsencrypt
# acme.json should be -rw------- (chmod 600)

Inside the container:

docker exec -it traefik-demo sh
ls -l /letsencrypt/acme.json

In logs:

docker logs -f traefik-demo | grep -i acme

Look for:

Certificates being requested / obtained
No “resolver skipped” or “challenge not enabled” errors

Why This Architecture Makes Sense for n8n

In the end, this setup gives you:

A single, secure entrypoint (Traefik on ports 80/443).
n8n hidden behind a Docker network (frontend), not exposed directly to the internet.
Automatic, renewable HTTPS via Let’s Encrypt.
Forced HTTPS (HTTP → HTTPS redirect).
Routing defined right next to the service (as Docker labels) instead of a giant reverse proxy config.
n8n correctly configured with its external URL and secured with Basic Auth.

And the best part: when you want to add another service—say, an API at api.mahdishadi.me—you just add another container with a few Traefik labels. Traefik discovers it automatically; no restarts, no big config edits..

Want Seamless, Scalable Storage in Kubernetes? Here’s Why Longhorn is Your Best Bet!

Mahdi Shadi — Wed, 15 Oct 2025 10:08:49 GMT

Longhorn: Scalable Storage for Kubernetes

What is Longhorn?

Imagine you're setting up a big project on Kubernetes and you need scalable storage that's both easy to use and reliable. That's where Longhorn comes in. Longhorn is a distributed storage system built specifically for Kubernetes. In simple terms, Longhorn is a cloud-based storage solution for Kubernetes that lets you easily store data in your cluster, in a distributed way, with high availability..

Longhorn Structure

Longhorn uses a distributed structure with a few main components:

Controller: Manages the cluster's state and coordinates between nodes.
Replica: Stores copies of data. These replicas are spread across different nodes in the cluster.
Engine: Processes requests and reads/writes data from various nodes.
Driver: The main interface for interacting with other storage systems.

This structure allows you to distribute data across multiple nodes, providing scalability, reliability, and high performance within Kubernetes.

Why Should You Use Longhorn?

When you're working with Kubernetes, one of the biggest challenges is data storage. Most traditional storage systems aren’t optimized for Kubernetes and come with problems like poor scalability, slow performance, and complex management.

But Longhorn solves these problems:

High Scalability: Longhorn can easily scale by adding new nodes, increasing your storage capacity.
High Availability (HA): If one of your nodes goes down, Longhorn can load the data from other nodes, so there's no problem.
Easy Setup: Longhorn is designed for Kubernetes, so installation is simple and fast.

If you're looking for a storage system with these features for your projects, Longhorn is one of the best options.

Pre-requisites for Installing Longhorn

Before installing Longhorn on Kubernetes, make sure the following pre-requisites are installed on your cluster nodes. These tools will help Longhorn manage disks and storage resources properly.

1. open-iscsi

Longhorn needs open-iscsi to connect to disks and storage resources. Install it like this:

sudo apt-get update
sudo apt-get install -y open-iscsi
sudo systemctl enable --now iscsid

2. nfs-common

For NFSv4 support and to share data between nodes, you need nfs-common:

sudo apt-get update
sudo apt-get install -y nfs-common

3. cryptsetup

If you want to encrypt disks, install cryptsetup:

sudo apt-get update
sudo apt-get install -y cryptsetup

4. device-mapper-persistent-data

To use LVM for disk management, install device-mapper-persistent-data:

sudo apt-get update
sudo apt-get install -y device-mapper-persistent-data

5. System Utilities

For managing the system and running various commands, you need utilities like bash, curl, findmnt, grep, etc.:

sudo apt-get update
sudo apt-get install -y bash curl findmnt grep awk blkid lsblk

Checking Prerequisites with longhornctl Script

When you want to install Longhorn on Kubernetes, you need to make sure all the necessary prerequisites are installed correctly. To make this easy, you can use the longhornctl script, which checks the status of your pre-requisites and tells you if everything's good or if something's missing.

Installing longhornctl

First, you need to download longhornctl, which acts like a checker that verifies if the required pre-requisites are in place.

To install longhornctl on your system, run:

curl -sSfL -o longhornctl https://github.com/longhorn/cli/releases/download/v1.10.0/longhornctl-linux-amd64
chmod +x longhornctl

Checking Prerequisites with longhornctl

Now that you've installed longhornctl, you can easily use it to check if the prerequisites for Longhorn are installed properly.

Just run this command:

./longhornctl check preflight

This will automatically check if things like open-iscsi, nfs-common, cryptsetup, and other needed tools are installed and running.

How Will the Script Output Look?

If everything is installed correctly, the output will look something like this:

Checking prerequisites...
- [PASS] open-iscsi: Installed and running
- [PASS] nfs-common: Installed
- [PASS] cryptsetup: Installed
- [PASS] device-mapper-persistent-data: Installed

But if any of the prerequisites are missing or not working properly, longhornctl will let you know what the problem is and what action to take.

Installing Longhorn on Kubernetes

Now that the prerequisites are installed, it's time to install Longhorn on Kubernetes.

1. Add the Helm Repository

First, add the Helm repository for Longhorn:

helm repo add longhorn https://charts.longhorn.io
helm repo update

2. Create a Namespace for Longhorn

To manage Longhorn, create a longhorn-system namespace:

kubectl create namespace longhorn-system

3. Install Longhorn Using Helm

For the first create yaml file to ensure Longhorn Replicas:

longhornUI:
  replicas: 1

Now, install Longhorn using the following command:

helm install longhorn longhorn/longhorn -n longhorn-system --values longhorn.yaml

4. Check Installation Status

To ensure Longhorn is installed correctly, check the pods:

kubectl -n longhorn-system get pods

What i did:

Why Couldn’t I Access Longhorn from Outside Without These Configurations?

When you install Longhorn, it’s basically a service running inside your Kubernetes cluster, and by default, it’s only accessible from inside the cluster. So, if you want to access the Longhorn dashboard from outside the cluster (for example, from your browser), you won’t be able to, because Kubernetes, by default, only exposes services of type ClusterIP internally.

To make services like Longhorn accessible from outside the cluster, we need either a Load Balancer or an Ingress Controller. That's where Traefik comes in.

Why Did I Use Traefik?

In Kubernetes, if we want to route external traffic into the cluster, we need an Ingress Controller. Traefik is one of the best options for this job. Here’s why I used Traefik:

Excellent Integration with Kubernetes: Traefik works natively with Kubernetes, automatically detecting and routing traffic to the right services inside the cluster.
SSL/TLS Support: Traefik can automatically redirect traffic from HTTP to HTTPS and can get SSL certificates from Let’s Encrypt without much hassle.
Auto Discovery: When a new service is added, Traefik automatically detects it, so we don’t need to manually update the configuration.
Scalability: Traefik can balance traffic between multiple instances of the same service, making it easy to scale as needed.

So, to access Longhorn from outside the cluster, I needed an Ingress Controller, and Traefik was the perfect choice for this.

Why Did I Add SSL to Traefik and Longhorn?

Now that we're accessing the Longhorn service from outside the cluster, it's important to ensure the connection is secure. This is where SSL/TLS comes in.

Why SSL?

Security: Without SSL, any data transmitted between the user and the server can be intercepted. SSL ensures that the connection is encrypted, protecting the data from Man-in-the-Middle (MITM) attacks.
Encryption: SSL ensures that communications with Longhorn and any other services inside Kubernetes are encrypted, making the connection safe.
SEO: When a site or service has SSL, search engines like Google trust it more. This gives Longhorn the security and credibility it needs.

SSL with Traefik: Technical Overview

Let’s Encrypt: I used Let’s Encrypt, which automatically issues SSL certificates, making it super easy to implement secure communication without manual certificate management.
SSL Termination: With Traefik, SSL termination happens at the Traefik level, meaning SSL is handled by Traefik, and then the traffic is sent as HTTP inside the cluster. This simplifies the overall architecture and offloads SSL work from the backend services.

Why Couldn’t I Access Longhorn Without This Configuration?

When you install Longhorn, it’s a ClusterIP service by default, which means it’s only accessible within the cluster. To expose it to the outside world, you need either a LoadBalancer or an Ingress Controller like Traefik.

By default, Kubernetes only exposes services with ClusterIP type internally. So, without using Ingress or LoadBalancer, no one from outside can access your internal services like Longhorn.

How Does Traefik Solve This?

IngressRoute in Traefik allows us to route incoming traffic from outside the cluster to internal services like Longhorn.
Traefik automatically detects services and routes traffic accordingly. This means that we can access Longhorn from outside the cluster once we configure Traefik correctly.

🔧 What Makes Longhorn Accessible from Outside the Cluster?

In the end, by using Traefik as an Ingress Controller, I was able to route incoming traffic from outside the cluster to Longhorn. This made accessing the Longhorn dashboard easy and secure using SSL.

Here’s How It Worked:

I created an IngressRoute in Traefik, telling it to route traffic with the Host longhorn.mahdishadi.me to the longhorn-frontend service inside the cluster.
I set up SSL for security, ensuring all traffic is redirected from HTTP to HTTPS.
Traefik automatically fetched an SSL certificate from Let’s Encrypt.

Using HTTPS with Traefik

To make sure access to Longhorn from outside Kubernetes is secure (HTTPS), you need to use Traefik. Here's how to configure SSL and IngressRoute:

1. Configuring Traefik for HTTP → HTTPS Redirect & IngressRoute for Longhorn Access

To redirect HTTP requests to HTTPS, create a middleware called https-redirect and to configure the IngressRoute via Traefik, make sure requests go to port 80:

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: https-redirect
  namespace: longhorn-system
spec:
  redirectScheme:
    scheme: https
    permanent: true
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: longhorn-web-http
  namespace: longhorn-system
spec:
  entryPoints: [web]
  routes:
  - match: Host(`longhorn.mahdishadi.me`)
    kind: Rule
    middlewares:
    - name: https-redirect
    services:
    - name: longhorn-frontend
      port: 80

2. Configuring SSL with cert-manager

To get an SSL certificate from Let’s Encrypt, configure a ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-http01
spec:
  acme:
    email: mahdishadi99@gmail.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-account-key
    solvers:
    - http01:
        ingress:
          class: traefik

At the end, you can see the UI on the web browser.

Explain What PV, PVC, and StorageClass about:

When real-world apps are involved, “disk” is no joke. That’s why Kubernetes supports multiple storage models—block, file, and object—whether you’re on the cloud or in your own data center.

What Are the Plugin Layer and CSI?

There’s a plugin layer in the middle that brokers between Kubernetes and external storage systems. Modern plugins speak Container Storage Interface (CSI)—an open standard that lets storage drivers behave consistently across orchestration platforms like K8s.

The Core Trio: PV, PVC, and StorageClass

PersistentVolume (PV): The in-cluster handle (representation) of an external volume.
PersistentVolumeClaim (PVC): A pod’s request to use a PV.
StorageClass (SC): The “plan/tier” definition that automates creating PVs and backend volumes.

In plain English: PV is the map to the external volume, PVC is the permission slip to use it, and SC makes the whole thing dynamic and automatic.

A Step-by-Step Scenario (Longhorn – 50GB)

A Pod needs 50GB ⇒ it creates a PVC.
The PVC asks the StorageClass to create a new PV + backend volume.
The SC calls the Longhorn backend via the Longhorn CSI driver.
The CSI driver creates a 50GB volume on Longhorn .
CSI reports the external volume is ready.
The SC creates a PV and maps it to that Longhorn volume.
The Pod mounts the PV and starts using it.

Safety note: K8s prevents multiple pods from writing to the same PV willy-nilly. Also, PVs have a 1:1 relationship with external volumes; you can’t split a single 50GB volume into two 25GB PVs.

How Do Providers/Provisioners Get Added?

Each storage provider usually ships a CSI driver via Helm or a YAML installer. After installation, the driver’s pods run in the kube-system namespace and are ready to serve.

The Persistent Volume Subsystem in Practice

Say your external storage exposes these tiers:

Fast block (flash)
Fast encrypted block (flash)
Slow block (mechanical)
File (NFS)

You create one StorageClass per tier so apps can request exactly what they need. If a new app requires 100GB of encrypted flash, define a PVC in your Pod’s YAML that asks the sc-fast-encrypted class for 100GB.

When you apply the manifest, the SC controller notices the new PVC and tells the CSI driver to provision 100GB of encrypted flash. The external system creates the volume and reports back; CSI informs the SC controller, which maps it to a new PV. The PVC then binds to that PV, and your Pod mounts it.

Handy YAML Tips

apiVersion and kind declare the object’s type and API version.
metadata.name is a friendly identifier.
In a StorageClass, the provisioner field selects the CSI driver.
StorageClasses are immutable; if you misconfigure one, create a new one.
People often use provisioner / plugin / driver interchangeably.
The parameters block is driver-specific (varies per CSI).

Access Modes (How a Volume Can Be Used)

Kubernetes supports three:

ReadWriteOnce (RWO): One PVC can mount R/W (often from a single node).
ReadWriteMany (RWX): Multiple PVCs can mount R/W (typically file/object like NFS; block rarely supports this).
ReadOnlyMany (ROX): Multiple PVCs can mount read-only.

A PV can be opened in only one mode at a time; you can’t mount the same PV as ROX for one PVC and RWX for another simultaneously.

Reclaim Policy (What Happens When the PVC Is Deleted?)

Delete (default for dynamically created PVs): deleting the PVC deletes the PV and the external volume. Risky if you lack backups!
Retain: keeps the PV and external volume after the PVC is deleted—you’ll clean things up manually. Safer.

VolumeBindingMode & Topology (WaitForFirstConsumer)

If you set volumeBindingMode: WaitForFirstConsumer in your StorageClass, the system waits to create the volume until a real Pod that uses the PVC is scheduled. The result? The volume is provisioned in the same region/zone as the Pod—avoiding cross-zone/region latency and costs.

Alright—let’s wire Longhorn up to your Pods’ PVCs once and for all. You can drop this straight into your article—the tone’s casual, and the YAML is copy-paste ready 👇

Create a PVC with Longhorn

1) Prereq: Longhorn should be up and healthy

First, make sure Longhorn is installed and its Pods are healthy:

kubectl -n longhorn-system get pods

If everything is Running, move on. If not, fix Longhorn first, then come back.

2) Create a StorageClass (if you don’t have one)

Longhorn usually ships a default StorageClass named longhorn. If you want custom settings (like replica count), create one like this:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-sc
provisioner: driver.longhorn.io
allowVolumeExpansion: true   # so you can grow the volume later
reclaimPolicy: Delete        # delete PV when PVC is deleted (tweak as needed)
parameters:
  numberOfReplicas: "3"      # how many copies across nodes
  staleReplicaTimeout: "30"  # minutes; remove slow/stale replicas
  fsType: "ext4"             # filesystem on the volume

Apply it:

kubectl apply -f longhorn-sc.yaml

3) Create the PersistentVolumeClaim (PVC)

Time to request some space. This example asks for 10Gi with RWO:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce            # single-node access; fits most apps
  resources:
    requests:
      storage: 10Gi
  storageClassName: longhorn-sc  # or use 'longhorn' if you want the default

Apply it and check:

kubectl apply -f my-pvc.yaml
kubectl get pvc my-pvc

If STATUS is Bound, your volume is ready.

Now attach that storage to your container.

5) Handy tips that matter

RWO vs RWX: RWO works for most cases. For multi-node sharing (RWX), use Longhorn’s Share Manager (you’ll need an RWX-capable StorageClass).
Expanding size: With allowVolumeExpansion: true, you can grow a PVC later (shrinking isn’t supported). Some apps may need a restart to see the new size.
Troubleshooting Pending:
- Run kubectl describe pvc my-pvc to see what’s blocking.
- Check node/disk capacity.
- Inspect Longhorn components: kubectl -n longhorn-system get pods and look at the Longhorn Manager/CSI logs.
Replicas: More replicas = better resilience to node failure, but higher disk usage. Three is a safe bet.

6) Overview

Longhorn healthy? ✔️
Create (or use) a StorageClass (longhorn or your custom one) ✔️
Define a PVC and mount it in your Pod ✔️
If it’s Pending, describe the PVC and check logs ✔️

Zero-Downtime Zabbix Migration: How I Moved from 6.4 to 7.0 Without Losing a Single Metric

Mahdi Shadi — Tue, 07 Oct 2025 10:56:48 GMT

Overall Architecture: Why It Works

Zabbix Server on Server 2 is the brain of the operation: it caches configuration, processes triggers, and writes data into the database.
Frontend (running on Apache) provides dashboards, graphs, and configuration UI.
Database (MySQL/MariaDB) stores history, trends, and metadata.
Zabbix Proxies sit across sites — while the main server is cut over, they buffer data and later push it in bulk. That’s what makes the whole migration safe.

Before Starting: Backup and Freeze

I scheduled a short data freeze to ensure nothing was written during the dump. Then I took a clean backup of the Zabbix database and temporarily stopped the old server — so the dump wouldn’t get interrupted.
Proxies kept collecting data in their buffers, waiting to sync afterward.

Be sure to take snapshots of your machines and make sure you have backups before you start.

# consistent backup + clean handoff
mysqldump --single-transaction -u root -p zabbix > /tmp/zabbix_$(date +%F).sql
sudo systemctl stop zabbix-server

# also save configs for safety
# /etc/zabbix/* and /etc/zabbix/web/zabbix.conf.php (if local)

Building Server 2: Apache + PHP 8 + Database + Zabbix 7.0

I built the new environment in order: web stack → database → Zabbix.
If you get the classic ERROR 2002 (socket) while connecting to MySQL, it simply means the MySQL service isn’t installed or running.

# web stack
sudo apt update
sudo apt install -y apache2 php php-{mysql,mbstring,xml,bcmath,gd,curl}

# DB server
sudo apt install -y mysql-server
sudo systemctl enable --now mysql
sudo mysql   # enter MySQL shell (Ubuntu uses socket auth)

Create a dedicated Zabbix database with utf8mb4_bin to avoid Unicode comparison issues, and a user with proper privileges:

CREATE DATABASE zabbix CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
CREATE USER 'zabbix'@'localhost' IDENTIFIED BY 'StrongPass!';
GRANT ALL PRIVILEGES ON zabbix.* TO 'zabbix'@'localhost';
FLUSH PRIVILEGES;

Then add the official Zabbix 7.0 repository and install the packages:

wget https://repo.zabbix.com/zabbix/7.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_latest_7.0+ubuntu24.04_all.deb
sudo dpkg -i zabbix-release_latest_7.0+ubuntu24.04_all.deb
sudo apt update
sudo apt install zabbix-server-mysql zabbix-frontend-php zabbix-apache-conf zabbix-sql-scripts zabbix-agent2
sudo apt install zabbix-agent2-plugin-mongodb zabbix-agent2-plugin-mssql zabbix-agent2-plugin-postgresql

Restoring Data and Connecting Services

Once the database was ready, I restored the dump and pointed both the server and frontend to it. If you skip this carefully, you’ll face DB/frontend mismatches later.

# bring back your history and config
mysql -u root -p zabbix < /tmp/zabbix_YYYY-MM-DD.sql

# configure DB parameters for the Zabbix server
sudo nano /etc/zabbix/zabbix_server.conf
# DBName=zabbix
# DBUser=zabbix
# DBPassword=StrongPass!
# DBHost=127.0.0.1

# enable frontend configuration for Apache
sudo a2enconf zabbix-frontend-php
sudo systemctl reload apache2

Don't forget to change the IP on the second server back to the previous Zabbix IP or apply your changes to DNS!

When you start Zabbix 7.0, it automatically upgrades the schema. I monitored the logs during this step.
If the UI complains like “DB version 6050035, required 7000000,” it means the new server hasn’t fully upgraded the DB yet — or your binlog trust settings are blocking it.

# start services and watch logs
sudo systemctl enable --now zabbix-server zabbix-agent apache2
sudo tail -f /var/log/zabbix/zabbix_server.log

# if mismatch: verify versions and DB targets, then temporarily trust function creators
zabbix_server -V
grep -E 'DB(Name|User|Host)' /etc/zabbix/zabbix_server.conf
grep -E 'DB(Name|User|Host)' /etc/zabbix/web/zabbix.conf.php
sudo mysql -e "SET GLOBAL log_bin_trust_function_creators=1;"
sudo systemctl restart zabbix-server
# revert after upgrade
sudo mysql -e "SET GLOBAL log_bin_trust_function_creators=0;"

Breathing Room for the Caches

For a busy infrastructure, CacheSize=32M is a joke. I bumped it to 1 GB (2–4 GB for very large environments). Then I verified /dev/shm was big enough for shared memory — ideally double the cache size. Using the internal item zabbix[rcache,buffer,pused], I kept cache utilization between 40–60%. Anything above 70% meant it was time to increase it again.

# give config cache real headroom
sudo sed -i 's/^#\?CacheSize=.*/CacheSize=1024M/' /etc/zabbix/zabbix_server.conf
sudo systemctl restart zabbix-server
sudo tail -f /var/log/zabbix/zabbix_server.log

# ensure shared memory is large enough (≥ 2× CacheSize)
df -h /dev/shm

Post-Cutover Validation

Once the cutover was done, I confirmed all Proxies were Connected, Latest Data was flowing, and media/actions worked correctly.
If the UI complained about missing PHP extensions or timezone issues, fixing them and reloading Apache solved it quickly.

These small checks make the following day calm and predictable.

Rollback tip: If you ever need to revert, just stop services on Server 2, restore the pre-migration backup, point DNS back to Server 1, and review logs before retrying.

Final Thoughts

This migration reinforced one core lesson: a Zabbix upgrade isn’t hard if your proxies and database hygiene are solid. Plan your freeze, verify each step, and keep an eye on logs — that’s 90% of the job.

Monitoring internet Speed with Zabbix and Grafana

Mahdi Shadi — Thu, 18 Apr 2024 09:42:33 GMT

Zabbix is an open-source monitoring software tool used for monitoring diverse IT components, including networks, servers, virtual machines, and cloud services. It's designed to provide real-time insights into the performance and availability of various components in your infrastructure.

monitoring internet speed and Quality is one of the important things that Companies need. in this Article, we want to describe how we can monitor internet speed using Zabbix.

In this Article , I use Centos Version 8 and Zabbix 6.4.10.

Install dependencies

first, we need to install dependencies.

dependencies Contain git, curl, wget, Zabbix agent2 and speedtest

yum update
yum install git -y

rpm -Uvh https://repo.zabbix.com/zabbix/6.4/rhel/8/x86_64/zabbix-release-6.4-1.el8.noarch.rpm
dnf clean all
dnf install zabbix-agent2 zabbix-agent2-plugin-* -y
yum install zabbix-sender -y

curl -s https://packagecloud.io/install/repositories/ookla/speedtest-cli/script.rpm.sh | sudo bash
yum install speedtest -y

open /etc/zabbix/zabbix_agent2.conf file with nano and add your Zabbix IP and your server Hostname for add to Zabbix

Server=YOUR_ZABBIX_IP
ServerActive=YOUR_ZABBIX_IP
Hostname=YOUR_SERVER_NAME

after that restart Zabbix Agent2

systemctl restart zabbix-agent2
systemctl enable zabbix-agent2

Now you can add your server to your Zabbix Server

Clone the git Repository

then, clone the GitHub repository with the blow Command.

git clone https://github.com/soloranger/zabbix-internet-Speedtest-Template.git

cd to Directory and add Speedtest_Template.xml Zabbix Template from Data Collection > Templates > import

cd to Code Directory and run speedtest.sh with -s argument. -s argument for the server ID that you want to use for the speed test.

chmod +x speedtest.sh
./speedtest -s 58210

after a few seconds, You can see Zabbix_Sender, Send data to Zabbix

Grafana Dashboard

It's time to see the output in a beautiful dashboard in Grafana :D

open Grafana -> Click + icon -> select import dashboard -> select Config.json file from Git Repository

Notice: You should Connect your Zabbix to your Grafana.

after added, you can see your dashboard with this Data :)

Everything about SDDC

Mahdi Shadi — Thu, 11 Apr 2024 13:36:47 GMT

Migrating to the Cloud has brought unprecedented flexibility to today's companies. However, since today's businesses use these benefits more, they risk creating unwanted complexity in the data center.

To solve this problem, SDDC integrates virtualized infrastructure and simplifies resource provisioning and management. When successfully implemented, it is an integrated architecture that considers and coordinates all essential components of the data center.

To understand the benefits and challenges surrounding SDDC, we need to consider how we got here

Traditional Data Center vs Virtualization:

In its old form, the data center consists of three pillars of IT infrastructure: computing resources, storage, and networking

Once upon a time, all three were physical resources that were often geographically co-located—for some organizations, this is still true. However, the vast majority of companies have moved at least part of their infrastructure to the cloud and Some of the resources are virtualized due to scalability, flexibility, and cost-effectiveness. Today, virtual resources may take many forms.

SDDC Architecture:

The SDDC architecture represents a complex approach to data center management and has multiple layers that focus on different functions.

Physical layer: Computing, storage, and network devices in the data center are placed in this layer. This layer focuses on the performance and operational stability of the devices and provides a stable environment for the entire network and SDDC business operations.

Virtual layer: Controls access to the physical infrastructure and segregates resources to provide them as services. It is also responsible for monitoring network operations and resource allocation, simplifying data center management, and improving efficiency.

management layer: standardizes management and enables orchestration and automation capabilities, allowing SDDC to be controlled from a central point.

To better understand SDDC, it is better to learn more about the concepts of SDN and SDS.

SDN:

SDN stands for Software Defined Network, a network architecture approach that enables network control and management using application software. The network behavior of the entire network and Software Defined Network (SDN) through its devices are programmed in a centrally controlled manner through software using available APIs.

To understand Software Defined Network we need to understand the different levels involved in the network. These levels are:

Data plane
Control Plane

DataPlane:

All activities related to data packets sent by the client are related to this section:

1. Sending packets

2. Division and re-collection of data.

3. Repetition of packets for multicasting

Control Plane:

All activities necessary to perform Data Plane activities except for client data packs. In other words, it is the brain of the network that includes:

1. Create routing tables

2. Setting policies for packet management

SDN Importance:

1. Better network connectivity: SDN provides much better network connectivity for sales, service and internal communications. SDN also helps to share data faster.

2. Better deployment of programs: The deployment of programs, services, and many new business models can be increased by using Software Defined Networking.

3. Better security: SDN provides better visibility across the network. Operators can create separate zones for devices that require different levels of security. SDN networks give operators more freedom.

4. Better control with high speed: SDN provides better speed than other types of networks by using a software-based controller.

In short, it can be said that SDN acts as a larger umbrella or a Hub where the rest of the network technologies come and sit under that umbrella and integrate with another platform to reduce traffic and increase the efficiency of data flow, the best result.

Where is SDN used?

Companies use SDN to deploy applications faster while reducing deployment and operational costs. SDN allows network administrators to manage and deliver network services from a single location.

SDN components:

The three main components that make up SDN are:

SDN programs: SDN programs send requests or networks through the SDN Controller using API.
SDN Controller: SDN Controller collects network information from hardware and sends this information to applications.
SDN network devices: SDN network devices help in sending and processing data

SDN architecture:

In an old network, each switch has its data plane and control plane. The control plane exchanges the topology information between the various switches and thus creates a forward table that decides where the incoming data packet should be forwarded through the data plane.

SDN is an approach through which we separate the Control Plane from the switch and assign it to a centralized unit called the SDN Controller. In this way, the network admin can shape the traffic through the central console without having to access the switches.

The data plane also resides in the switch, and when a packet enters a switch, its forwarding activity is determined based on table entries, which are pre-defined by the controller.

A typical SDN architecture consists of three layers.

Application layer: includes common network applications such as intrusion detection, firewall, load balancer, etc.

Control layer: includes the SDN Controller, which acts as the brain of the network. It also allows hardware separation for programs written on top of it.

Infrastructure layer: This layer includes the physical switches that form the data plane and perform the actual movement of data packets.

The layers communicate through sets of connections called (north-bound API between the application and control layer) and (southbound API between the control and infrastructure layer)

SDS:

Software-defined Storage or SDS is a storage architecture that separates the storage software from its hardware. Unlike legacy storage systems such as network-attached storage (NAS) or storage area network (SAN), SDS is generally designed to run on any industry-standard or x86 system, eliminating software dependency on proprietary hardware.

Advantages of SDS:

1. The SDS you choose should not be from the same company that sold you the hardware. You can use any x86 commodity or server to create an SDS-based storage infrastructure. This means you can maximize the capacity of your existing hardware as your storage needs increase.

2. SDS allows you to adjust the capacity and performance completely independently, according to the needs of the organization.

3. SAN storage devices are limited to the number of nodes they can use. SDS, by its very definition, is not limited in this way and is theoretically infinitely scalable.

Advantages of Software-Defined Data Center:

Based on unique features, Software-Defined Data Center enables organizations to achieve more flexible and faster deployment, management, and business implementation at lower cost.

Business agility:

With infrastructure management, automation, and service orchestration functions, SDDC removes the physical dependency of hardware and enables real-time provisioning of resources, which can manage workloads and respond quickly to business demands.

In fact, the time of deployment and provision of resources can be significantly reduced and it does not take much time to provide more storage capacity for applications and modify the physical network.

Increased scalability:

Cloud-based SDDC allows organizations to scale up or down performance as needed to meet changing demand. Increasing or decreasing IT resources, such as data storage capacity, and network processing power, is very simple. SDDC offers unlimited scalability. No need to worry about freeing up more space to meet growing business needs.

Reduce costs:

SDDC can help reduce costs. Older data centers require more IT manpower, expensive equipment, time, and maintenance. While in SDDC they can avoid large capital costs. For example, the SDDC pools resources to improve infrastructure utilization and reduce the cost of purchasing new infrastructure. Better utilization also means lower costs for electricity, cooling, etc.

Simple data center management:

SDDC can be managed through a central dashboard, allowing network administrators to monitor data, update systems, and allocate additional storage resources. Compared to legacy data centers, which may require multiple IT tools, applications, and software to manage, SDDC makes data center management much simpler.

Mastering SSH Key-Based Authentication: Tips, Tricks, and Best Practices

Mahdi Shadi — Fri, 26 Jan 2024 19:59:20 GMT

SSH key-based authentication provides a sophisticated way to secure remote access to servers and systems, commonly used in Unix-like operating systems. It boasts numerous advantages over traditional password-based methods, including heightened security, ease of use, and automation capabilities. Let's delve into the essentials of SSH key-based authentication

Generating Key Pairs:

This authentication method relies on asymmetric cryptography. Users create a pair of cryptographic keys: a public key and a private key. The public key is placed on the server(s) the user intends to access, while the private key remains safely stored on the user's local system.

Placing Public Keys:

Users typically append their public key to the ~/.ssh/authorized_keys file on the server they wish to access. This file serves as a repository of authorized public keys allowed to log in to the associated account.

Safeguarding Private Keys:

The private key is a critical piece kept secure on the user's local machine. It's imperative to protect it with robust encryption and permissions. Users can bolster security by using passphrase-protected private keys, adding an extra layer of defense.

Authentication Process:

When a user tries to log in to the server, the SSH client sends a challenge encrypted with the user's public key. The server decrypts this challenge using the authorized public key stored in the authorized_keys file. If the decrypted challenge matches the one sent by the client, access is granted.

Security Benefits:

Resistance to Brute Force Attacks: Since the private key isn't transmitted over the network, SSH key-based authentication is resilient against brute-force attacks.
Elimination of Passwords: This method eradicates the need for password-based logins, mitigating risks associated with password-related attacks like dictionary attacks or phishing.
Logging and Accountability: Each user's access is tied to their unique key pair, providing a clear audit trail that enhances accountability and simplifies forensic analysis in case of security incidents.

Convenience and Automation:

Single Sign-On (SSO): Once set up, SSH key-based authentication enables seamless access to multiple servers without the hassle of entering passwords repeatedly.
Automated Processes: SSH keys are commonly used in automated scripts and processes, facilitating secure, passwordless interactions between systems.

Key Management:

Rotation: Regularly rotating keys is recommended to reduce the risk of compromised keys.
Revocation: If a private key is compromised or an employee leaves an organization, the associated public key should be removed from the authorized_keys file to revoke access.

Compatibility:

SSH key-based authentication enjoys broad support across various SSH implementations and is compatible with SSH clients and servers on different platforms.

SSH key-based authentication is widely utilized across diverse environments to bolster security, simplify access, and streamline operations. Here's a look at some everyday scenarios where SSH key-based authentication plays a crucial role:

1. Managing Servers:

System administrators rely on SSH key-based authentication to securely access remote servers for tasks like maintenance, configuration, and troubleshooting. This method mitigates the risks associated with password-based logins, ensuring the protection of critical systems.

2. Cloud Infrastructure Control:

Organizations managing cloud infrastructure, such as AWS, Google Cloud Platform, or Azure, use SSH key-based authentication to securely connect to virtual machines and cloud instances. This practice limits access to authorized personnel, minimizing the chance of unauthorized entry or data breaches.

3. Development Environments:

Developers utilize SSH key-based authentication to access development servers, version control systems (e.g., GitHub or Bitbucket), and other development tools. This enables secure and seamless access to development environments, fostering collaborative software development processes.

4. Continuous Integration/Continuous Deployment (CI/CD):

In CI/CD pipelines, SSH key-based authentication is employed to authenticate between various stages of the pipeline (e.g., build, test, deploy). Automation tools like Jenkins or GitLab CI/CD use SSH keys to securely access servers, deploy code, and execute deployment scripts without manual intervention.

5. Database Management:

Database administrators (DBAs) employ SSH key-based authentication to securely access databases hosted on remote servers. By configuring SSH key-based authentication, DBAs establish secure connections for tasks such as database backups, migrations, and performance tuning.

6. Secure File Transfers:

Organizations rely on SSH key-based authentication for secure file transfers between systems using protocols like SCP or SFTP. SSH keys ensure secure authentication and encryption of file transfers, preserving the confidentiality and integrity of transmitted data.

7. Access Control Systems:

SSH key-based authentication integrates with access control systems to regulate user access to sensitive resources like financial data or proprietary software. Centralized management of SSH keys and access policies helps enforce least privilege access, enhancing overall security.

8. Remote IoT Device Management:

In IoT deployments, SSH key-based authentication enables secure access and management of IoT devices from remote locations. Manufacturers and administrators use SSH keys to authenticate and securely communicate with IoT devices for tasks such as monitoring, configuration, and software updates.

These scenarios underscore the versatility and significance of SSH key-based authentication in ensuring secure and authenticated access to a variety of systems and resources across different industries and use cases.

How can we use it?!

Suppose I have 2 servers that I named it Ubuntu-SRV1( ip: 192.168.1.43) & Ubuntu-SRV2(ip: 192.168.1.44). I want to connect to Ubuntu-SRV2 from the Ubuntu-SRV1 server with an SSH key.

on Ubuntu-SRV1, I typed this Command:

root@ubuntu-SRV1:/home/ubuntu# ssh-keygen -t rsa

as you can see from the picture, after this command You can change the path to save .ssh directory files or press enter to save them in the default path.

also, ssh-key requires a passphrase from you to create another security level for you when using this key, but it is optional.

after we create, The access level for the .ssh folder should be 600 like picture.

in the .ssh folder, we Have a public key and a Private key.
What we did on the first server, we also do for the second server

root@ubuntusrv2:/home/ubuntu# ssh-key

now, I should copy the ubuntu-SRV1 public key to ubuntu-SRV2

To do this, use this Command:

root@ubuntu-SRV1:/home/ubuntu# ssh-copy-id ubuntu@192.168.1.44

after this Command, we copied ubuntu-SRV1 public key into ubuntu-SRV2 server.

Now, we can use SSH from the ubuntu-SRV1 server to access the ubuntu-SRV2 server without using the Password.

In summary, SSH key-based authentication offers a robust and secure means of remote access, particularly suitable for environments prioritizing security, convenience, and automation. Adhering to best practices in key management and security is vital for maintaining its efficacy.