- 
                Notifications
    You must be signed in to change notification settings 
- Fork 162
Description
I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which uses ScrapyRT to listen for incoming requests on designated ports. These requests are intended to trigger spiders on the corresponding ports.
Despite correctly setting up a Kubernetes service and referencing the Scrapy pod in it, I'm unable to receive any incoming requests to the pod. My understanding is that in Kubernetes networking, a service should be created first, followed by the pod, allowing inter-pod communication and external access through the service. Is this correct?
Below are the relevant configurations:
scrapy-pod Dockerfile:
# Use Ubuntu as the base image
FROM ubuntu:latest
# Avoid prompts from apt
ENV DEBIAN_FRONTEND=noninteractive
# # Update package repository and install Python, pip, and other utilities
RUN apt-get update && \
    apt-get install -y curl software-properties-common iputils-ping net-tools dnsutils vim build-essential python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*
# Install nvm (Node Version Manager) - EXPRESS
ENV NVM_DIR /usr/local/nvm
ENV NODE_VERSION 16.20.1
RUN mkdir -p $NVM_DIR
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
# Install Node.js and npm - EXPRESS
RUN . "$NVM_DIR/nvm.sh" && nvm install $NODE_VERSION && nvm alias default $NODE_VERSION && nvm use default
# Add Node and npm to path so the commands are available - EXPRESS
ENV NODE_PATH $NVM_DIR/versions/node/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/v$NODE_VERSION/bin:$PATH
# Install Yarn - EXPRESS
RUN npm install --global yarn
# Set the working directory in the container to /usr/src/app
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the start_services.sh script into the container
COPY start_services.sh /start_services.sh
# Make the script executable
RUN chmod +x /start_services.sh
# Install any needed packages specified in package.json using Yarn - EXPRESS
RUN yarn install
# Expose all the necessary ports
EXPOSE 14805 14807 12085 14806 13905 12080 14808 8000
# Define environment variable - EXPRESS
ENV NODE_ENV production
# Run the script when the container starts
CMD ["/start_services.sh"]
start_services.sh:
#!/bin/bash
# Start ScrapyRT instances on different ports
scrapyrt -p 14805 &
scrapyrt -p 14807 &
scrapyrt -p 12085 &
scrapyrt -p 14806 &
scrapyrt -p 13905 &
scrapyrt -p 12080 &
scrapyrt -p 14808 &
# Keep the container running since the ScrapyRT processes are in the background
tail -f /dev/null
service yaml file:
apiVersion: v1
kind: Service
metadata:
  name: scrapy-service
spec:
  selector:
    app: scrapy-pod
  ports:
    - name: port-14805
      protocol: TCP
      port: 14805
      targetPort: 14805
    - name: port-14807
      protocol: TCP
      port: 14807
      targetPort: 14807
    - name: port-12085
      protocol: TCP
      port: 12085
      targetPort: 12085
    - name: port-14806
      protocol: TCP
      port: 14806
      targetPort: 14806
    - name: port-13905
      protocol: TCP
      port: 13905
      targetPort: 13905
    - name: port-12080
      protocol: TCP
      port: 12080
      targetPort: 12080
    - name: port-14808
      protocol: TCP
      port: 14808
      targetPort: 14808
    - name: port-8000
      protocol: TCP
      port: 8000
      targetPort: 8000
  type: ClusterIP
deployment yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scrapy-deployment
  labels:
    app: scrapy-pod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: scrapy-pod
  template:
    metadata:
      labels:
        app: scrapy-pod
    spec:
      containers:
      - name: scrapy-pod
        image: mydockerhub/privaterepository-scrapy:latest
        imagePullPolicy: Always  
        ports:
        - containerPort: 14805
        - containerPort: 14806
        - containerPort: 14807
        - containerPort: 12085
        - containerPort: 13905
        - containerPort: 12080
        - containerPort: 8000
        envFrom:
        - secretRef:
            name: scrapy-env-secret
        - secretRef:
            name: express-env-secret
      imagePullSecrets:
      - name: my-docker-credentials 
scrapy-pod's logs in Powershell terminal:
> k logs scrapy-deployment-56b9d66858-p59gs -f
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Site starting on 12080
2024-01-09 21:53:27+0000 [-] Site starting on 14808
2024-01-09 21:53:27+0000 [-] Site starting on 14805
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f4cbdf44d60>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fef9b620a00>
2024-01-09 21:53:27+0000 [-] Site starting on 13905
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 14807
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f0892ff4df0>
2024-01-09 21:53:27+0000 [-] Site starting on 14806
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f00d3b99000>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fba9e321180>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f1782514f10>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 12085
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fb2054cd060>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
Issue:
Despite these configurations, no requests seem to reach the Scrapy pod. Logs from kubectl logs show that ScrapyRT instances start successfully on the specified ports. However, when I send requests from a separate debug pod running a Python Jupyter Notebook, they succeed for other pods but not for the Scrapy pod.
Question:
How can I successfully connect to the Scrapy pod? What might be preventing the requests from reaching it?
Any insights or suggestions would be greatly appreciated.