|6 min read

Dockerizing Legacy Apps: From Monolith to ECS

How we containerized a monolithic Java application and deployed it to AWS ECS with a full CI/CD pipeline

I recently joined a large entertainment enterprise, and one of the first things I inherited was a monolithic Java application responsible for content delivery workflows. The app had been running on a handful of EC2 instances managed with hand-rolled deployment scripts, and the state of affairs was roughly what you would expect: inconsistent environments, fragile deployments, and the classic "works on my machine" standoff between development and operations.

This is the story of how we containerized it and moved it to AWS ECS.

The Problem

The application was a Spring Boot monolith, roughly 200k lines of Java, talking to an Oracle database, a Redis cache, and a handful of internal REST APIs. Deployments involved SSH-ing into production boxes, pulling a JAR from an S3 bucket, and restarting the service with a shell script that had grown organically over three years.

The issues were predictable:

  • Environment drift: Dev, staging, and production ran different JDK patch versions, different OS-level dependencies, different environment variable conventions.
  • Rollback pain: Rolling back meant finding the previous JAR in S3, hoping the database migrations were backward-compatible, and praying.
  • Scaling limitations: Scaling meant provisioning new EC2 instances, running Ansible playbooks, and waiting 20 minutes.
  • Developer onboarding: New engineers spent their first two days getting the app running locally.

The Dockerfile

We started with a multi-stage Dockerfile. The goal was a reproducible build that produced a minimal runtime image.

# Build stage
FROM maven:3.5-jdk-8 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn package -DskipTests -B

# Runtime stage
FROM openjdk:8-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder /app/target/content-workflow-*.jar app.jar
RUN chown -R appuser:appgroup /app
USER appuser

HEALTHCHECK --interval=30s --timeout=3s --start-period=60s \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1

EXPOSE 8080
ENTRYPOINT ["java", "-XX:+UnlockExperimentalVMOptions", \
  "-XX:+UseCGroupMemoryLimitForHeap", \
  "-jar", "app.jar"]

A few decisions worth noting:

  • Multi-stage build keeps the final image small. The build stage with Maven and JDK is roughly 800MB; the runtime image with just the JRE and JAR comes in around 130MB.
  • Non-root user is a security baseline. Running as root inside containers is a common oversight.
  • CGroup memory awareness matters on ECS. Without that JVM flag, the JVM does not respect the container memory limits and will happily allocate beyond them, triggering OOM kills.
  • Health check uses Spring Boot Actuator. ECS uses this to determine container health and trigger replacements.

Local Development with Docker Compose

We wanted developers to run the full stack locally with a single command. Docker Compose made this straightforward.

version: "3.4"
services:
  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - SPRING_PROFILES_ACTIVE=local
      - DATABASE_URL=jdbc:oracle:thin:@oracle-db:1521/XEPDB1
      - REDIS_HOST=redis
    depends_on:
      - oracle-db
      - redis

  oracle-db:
    image: oracleinanutshell/oracle-xe-11g:latest
    ports:
      - "1521:1521"
    volumes:
      - oracle-data:/u01/app/oracle

  redis:
    image: redis:4-alpine
    ports:
      - "6379:6379"

volumes:
  oracle-data:

New developer onboarding went from two days to docker-compose up and a coffee break.

CI/CD Pipeline

We used Jenkins (the enterprise standard at the time) to build a pipeline that went from commit to production.

The pipeline stages:

  1. Build and Test: Run the Maven build inside Docker, execute unit and integration tests.
  2. Image Build: Build the Docker image, tag with git SHA and branch name.
  3. Push to ECR: Authenticate with AWS ECR, push the tagged image.
  4. Deploy to Staging: Update the ECS service with the new task definition, wait for stability.
  5. Smoke Tests: Hit staging endpoints, verify health and basic functionality.
  6. Deploy to Production: Blue-green deployment via ECS, shift traffic after health checks pass.

The ECS task definition looked something like this:

{
  "family": "content-workflow",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "content-workflow",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/content-workflow:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "wget -qO- http://localhost:8080/actuator/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 120
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/content-workflow",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "environment": [
        {"name": "SPRING_PROFILES_ACTIVE", "value": "production"},
        {"name": "JAVA_OPTS", "value": "-Xmx1536m -Xms1024m"}
      ]
    }
  ]
}

Blue-Green Deployment

ECS supports blue-green deployments through CodeDeploy integration, but we went with a simpler approach: ECS rolling updates with deployment circuit breakers.

The key configuration:

{
  "deploymentConfiguration": {
    "maximumPercent": 200,
    "minimumHealthyPercent": 100,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }
}

This ensures ECS spins up new tasks before draining old ones. If the new tasks fail health checks, the circuit breaker triggers an automatic rollback. No more 3 AM pages for broken deployments.

Service Discovery

We used AWS Cloud Map for service discovery, which integrates natively with ECS. Each service registers itself in a private DNS namespace, so inter-service communication uses DNS names instead of load balancer endpoints or hardcoded IPs.

content-workflow.internal.corp :10.0.1.45 (task A)
                                :10.0.1.78 (task B)

This eliminated a class of configuration management headaches and made the system topology self-describing.

Graceful Shutdown

One issue that bit us in production: ECS sends a SIGTERM to your container and waits 30 seconds (configurable via stopTimeout) before sending SIGKILL. If your application does not handle SIGTERM properly, in-flight requests get dropped.

Spring Boot handles this reasonably well out of the box, but we added explicit shutdown hooks for draining the connection pool and flushing metrics:

@PreDestroy
public void onShutdown() {
    log.info("Received shutdown signal, draining connections...");
    connectionPool.close();
    metricsReporter.flush();
    log.info("Graceful shutdown complete");
}

We also increased the ECS stopTimeout to 60 seconds and configured the ALB target group deregistration delay to match, giving long-running requests time to complete.

Lessons Learned

Start with the Dockerfile, not the orchestrator. We spent too much time early on debating ECS vs Kubernetes vs Elastic Beanstalk. The real value was in containerizing the app itself. Once we had a good Dockerfile and Docker Compose setup, the orchestrator choice became less fraught.

Monitor container-specific metrics. Traditional VM monitoring does not translate directly. CPU and memory metrics at the container level behave differently than at the host level. We added Prometheus metrics via Micrometer and shipped them to CloudWatch.

Database migrations need a strategy. We used Flyway, and we enforced a rule: every migration must be backward-compatible with the previous application version. This made rollbacks safe and deployments independent of schema changes.

Image size matters more than you think. Our first Docker image was 1.2GB. Pulling that across multiple ECS tasks during a deployment added real latency. The multi-stage build cut it to 130MB. Alpine base images are worth the minor compatibility headaches.

The entire migration took about six weeks from first Dockerfile to production traffic on ECS. Deployments went from a 45-minute manual process to a 12-minute automated pipeline. More importantly, they went from something we dreaded to something we barely thought about.

That is the real measure of good infrastructure: when it disappears from your daily concerns.

Share: