DevOps

  • Write up visualize or devops solutions for crontab


  • Node upgrades
  • Prod and NonProd for k3s
    • What does it take to create LoadBalancer for multi-node k3s cluster?
  • Blue/Green deployments for everything
  • Highest DevOps
    • Add Alerts on AWS budgets
      • Look at new way in aws docs
    • AWS resource reduction to save costs
    • Best k8s management tools
    • asdf maintenance
    • Garbage collect old images in ECR
  • Design prod/non-prod architecture
  • Update and test Data backup / Restore plan
  • Check kube-state-metrics install
  • k8s: Build a Limited access KubeConfig file
    • to have non-root access
    • Maybe only namespace


  • Security
    • farm: enable user namspacing for all env
    • farm: Can we use SOPS to protect values
    • Add asuser to all docker services that mount a fs of the host
    • Create SOPS KMS keys in all accounts and east/west
    • 2FA Google and AWS Cloud
    • Review Security threats of ECR scans
    • SSO solution for products and development
      • Login one place and SSO SAML
      • AWS Token service for awscli
    • Add brotli to backend
      • Seems to be working on prm-stage
    • up a quarterly or monthly reminder to do apt updates everywhere.
      • Write an ansible script
    • todo
      • zap proxy security scan
      • chaos monkey
      • django single failure point
      • 2fa on wireguard
      • Logs for 90days
      • IPS/IDS SIEM monitoring?
      • find Glenn security docs
    • r2: 163OxxEi2
  • Network
    • Add compression and headers to all Caddy servers
    • k8s: nginx to apply certs for internal traffic
    • Wireguard - restrict the IPs and ports a client can access
      • ufw or something else
    • Draw network diagrams
      • Enable all nets from one bastion or get multiple wg if working?
    • Draw Pictures: networks, k8s
    • Encrypt in flight. Use NGINX like ascend pods
  • Monitoring
    • How to tell if pod is running out of memory
    • Alerts
      • Get SMS on Prometheus working for Monitoring alerts
        • No use slack instead
        • Sending Email and SMS
        • Requirement: Able to send SMS
        • Twilio, MailChimp, …
        • Add reputation measures and permission for prod to send SES
          • Dedicated IP?
      • Send Server down alerts to Slack
      • x-prod eks: Add Slack notification for deployments
      • Send SMS. try sandbox now.
      • monitor: Add alert for wg servers
      • monitor: Add alert for dnsmasq on wg servers
    • Commercial Offerings
      • Sentry
        • What is sentry-cli?
        • Add JIRA connectivity
        • Cost?
      • NewRelic
        • Do we need this?
        • Enable more NewRelic features like apache
        • Pick provider
          • AppDynamics, NewRelic, Elastic, ...
    • Draw pics
      • Data Flow
      • Network
      • Monitoring
    • Database
      • lock monitoring?
    • Turn more Linux auditing on
      • send syslog to remote box
    • Lock or limits on cron jobs
      • Create a python sub-process watcher/limiter
    • Limit memory for SqlServer docker and dotnet services
      • mem_limit - docker-compose version 2
  • Data / Backups / DR
    • Garbage collect backups
    • Update Disaster Recovery Documents
    • Perform a recovery
    • Data Management
      • Make drawings
      • MindMap of all data models
      • Backup Grafana settings
      • Copy Moodle prod db to dev
      • Need util to make db and schema copying easy
      • Verify all data backups
        • Check all backup data
          • Create a wiki page of all
        • SqlServer
        • Files in App_Data
        • Firebase users and ...
        • Firestore
        • Check Access for S3 and other
    • Encrypt all volumes
      • Clean up old unencrypted volumes/snaps
    • Copy the db replication setup from new rds x-prod-2-read
  • Billing
    • check billing everywhere at least monthly
    • Create a list of links

  • Update EKS clusters
  • Future
    • Setup ansible awx or semaphore on prod cluster or instance
    • Webhook / commit driven gitops
    • flux, atlantis, awx, semaphore, what to use?
      • flux terraform controller
    • Roadmap Page
      • Need to push for workflow defining what needs to happen.
      • Single sheet of DevOps Status
        • RAG list (Red, Amber, Green)
      • FARM developer setup
        • solr, moodle, redis, …
        • Code Style Checker
  • List open source we use in our apps like chrome://credits/
  • AWS storage costs. Get rid of images.
  • Solr Volume
    • needs a alert or number of items
    • monitor
  • Need to Fix bitbucket pipeline pausing on high freq commits
    • if not head-of-tree, push a commit in the one that is building.
  • k8s check role
    # remove from node role
    xx-prod-cert-manager
    eks-efs
    s3-email-prod
    s3-email-stage
    
    # terminal in pod
    pip install awscli
    /home/django/.local/bin/aws sts get-caller-identity
    
    # using node role
    {
        "UserId": "AROAxVC454:i-041xx5e2833c05f9ad8",
        "Account": "2623x809",
        "Arn": "arn:aws:sts::262x09:assumed-role/eksctl-x-prod-us-west-2-nodeg-NodeInstanceRole-1AP5xWB/i-0415x9ad8"
    }
    
    # using service account
    kubectl describe pod -n cert-manager |less
    Environment:
          POD_NAMESPACE:                cert-manager (v1:metadata.namespace)
          AWS_STS_REGIONAL_ENDPOINTS:   regional
          AWS_DEFAULT_REGION:           us-west-2
          AWS_REGION:                   us-west-2
          AWS_ROLE_ARN:                 arn:aws:iam::26x809:role/cert-manager
          AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    
  • k8s: Bring in ascend
    • Use kompose to capture current deployments
    • SMS events
    • mooogggle events
  • k8s: Add liveness and readiness probes to all services
    livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 30
            timeoutSeconds: 10
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 30
            timeoutSeconds: 4
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
  • k8s: create x-dev k3s cluster and recipe for developers
    • terraform / ansible instance with k3s
    • Route53
  • framework to d alter scripts
    • ansible?
  • How do I free up disk space
    sudo docker container prune -f && sudo docker image prune -f
    sudo apt-get clean
    sudo apt-get autoremove
    dpkg --get-selections | grep linux-image1
    alias ducks='du -cks * | sort -rn | head'
  • Tell the story…
    • Pictures, Stats, …
  • Add Longhorn to k3s?
    • apt install jq nfs-common open-iscsi
  • Temp EKS cluster to gauge cost
  • Growing Disks
    alias dus='du --exclude=efs -ms * 2>/dev/null|sort -n|tail'
    
    # fui
    /var/lib/docker/volumes
    27119	farm-services_database-data
    
    # fui-stage full disk /var/lib/docker/overlay2
    2088	635ddefb71040361a29c81dc71388f78fb91a1ff9a354d463742435ed27605f1
    3037	0cbb349ad2d905748a9e47bb2e5062d003e5ae207109e358a5cb783674137697
    3037	54d4a68fe04e94f43695f721eaabf4c5fe0d018e4cf5aa72f81848e2b745752e
  • Disaster/Recovery (DR)
    • Code
      • Replicate repos from BitBucket to ?
      • Replicate repos from GitHub to ?
      • Create tarballs to archive alongside data archives
    • Configs
      • Capture all env
      • cron to export to ?
      • format of capture files
    • Data
      • Cron to export all data
      • to where?
      • copies?
      • age data backups out and remove
      • backup and restore tools
      • export/import firestore index
    • DNS
      • GoDaddy
        • cron to constantly save
      • Route53 in all aws accounts
    • Cloud Providers
      • Should all be captured in DevOps code
      • Need to cleanup and organize
      • Tools to ETL the projects
      • AWS
        • export all accounts
      • Google
        • export all projects
  • Postgres Optimizations
    • cache hit ratios
    • Would moving to RDS help?
  • Base OS updates
    • Need daily email a summary from each system
    • node_exporter has apt updates count
  • monitoring: detect outage of docker or k8s services
  • setup dns server (named,bind9) on bastions
    • Use bind9
    sudo apt update
    sudo apt install bind9 bind9utils bind9-doc -y
    vi /etc/bind/named.conf.options
    options {
        directory "/var/cache/bind";
    
        // Replace with your upstream DNS
        forwarders {
            8.8.8.8;
            1.1.1.1;
        };
    
        dnssec-validation auto;
        listen-on-v6 { any; };
        allow-query { any; };
    };
    
    vi /etc/bind/named.conf.local
    zone "example.com" {
        type master;
        file "/etc/bind/zones/db.example.com";
    };
    sudo mkdir -p /etc/bind/zones
    sudo nano /etc/bind/zones/db.example.com
    $TTL    604800
    @       IN      SOA     ns1.example.com. admin.example.com. (
                             3         ; Serial
                        604800         ; Refresh
                         86400         ; Retry
                       2419200         ; Expire
                        604800 )       ; Negative Cache TTL
    
    ; Name servers
    @       IN      NS      ns1.example.com.
    
    ; A records
    ns1     IN      A       192.168.1.10
    www     IN      A       192.168.1.11
    
    sudo named-checkconf
    sudo named-checkzone example.com /etc/bind/zones/db.example.com
    
    sudo systemctl restart bind9
    sudo systemctl enable bind9
    
    # config nic
    sudo nano /etc/netplan/01-netcfg.yaml
    network:
      version: 2
      ethernets:
        eth0:
          dhcp4: no
          addresses: [192.168.1.10/24]
          gateway4: 192.168.1.1
          nameservers:
            addresses: [192.168.1.10]
    sudo netplan apply
    
    # testing
    dig @192.168.1.10 example.com
    
  • NetBird WireGuard - use ai to get it setup