DevOps
- Write up visualize or devops solutions for crontab
- The Phoenix Project - book about devops
- The Unicorn Project - book about digital disruption
Check out backstage for gitops
Prod and NonProd for k3s
- What does it take to create LoadBalancer for multi-node k3s cluster?
- Blue/Green deployments for everything
Highest DevOps
Add Alerts on AWS budgets
- Look at new way in aws docs
- AWS resource reduction to save costs
- Best k8s management tools
- asdf maintenance
- Garbage collect old images in ECR
- GCP Project list - for cleanup
- Upgrade to AWS Instance Metadata Service (IMDS)
Anything helpful here?
- ntfy - https://ntfy.sh/ push to phone - Have to install app. Not much different than using Slack app
- tailscale - https://tailscale.com/ - secure ssh
- Xray-ui
- vuetorrent - https://github.com/VueTorrent/VueTorrent - UI for qBittorrent
- jellyfin - https://jellyfin.org/ - software media system
- radarr - https://github.com/Radarr/Radarr - Movie organizer
- sonarr - https://github.com/SonarrPVR/Sonarr - PVR
- prowlarr - https://github.com/Prowlarr/Prowlarr - Indexer manager
- cloudflare-ddns-updater
- truenas - https://www.truenas.com/
- headscale - https://headscale.net/stable/ - tailscale control server
- passwall2 - movie db?
- https://surfshark.com/ - VPN
- diun -
- https://github.com/plankanban/planka - project tracking
- https://github.com/hoarder-app/hoarder - bookmark everything
- https://github.com/FreshRSS/FreshRSS - self-hosted news aggregator
- https://github.com/usememos/memos - note taker
- https://github.com/daya0576/beaverhabits - habit tracker with goals
- https://github.com/PrivateBin/PrivateBin - pasted data is encrypted
- https://github.com/nocodb/nocodb - airtable alternate
- https://github.com/paperless-ngx/paperless-ngx - scan and archive docs
- https://github.com/immich-app/immich - store photos and vids
- https://github.com/nextcloud - safe home for data
- https://github.com/filebrowser/filebrowser - up/down files
- https://github.com/gethomepage/homepage - fancy links
- https://github.com/FlareSolverr/FlareSolverr - by pass cloudflare
- https://github.com/Fallenbagel/jellyseerr - media request and discovery manager
- https://github.com/beancount/fava - beancount UI
- https://github.com/beancount/beancount - double-entry accounting from text files
- Review DevOps Roadmap
- Design prod/non-prod architecture
- Update and test Data backup / Restore plan
- Check kube-state-metrics install
k8s: Build a Limited access KubeConfig file
- to have non-root access
- Maybe only namespace
Future
- AWS single sign-on
- See if Elastic Container Service (ECS) can help
- Figure out the base cost for gateway and service
- Learn limits of API gateway and Lambdas, SAM Tutorial
- Any ways AI could help?
Security
- farm: enable user namspacing for all env
- farm: Can we use SOPS to protect values
- Add as
userto all docker services that mount a fs of the host
- Create SOPS KMS keys in all accounts and east/west
- 2FA Google and AWS Cloud
- Review Security threats of ECR scans
SSO solution for products and development
- Login one place and SSO SAML
- AWS Token service for awscli
Add brotli to backend
- Seems to be working on prm-stage
up a quarterly or monthly reminder to do apt updates everywhere.
- Write an ansible script
todo
- zap proxy security scan
- chaos monkey
- django single failure point
- 2fa on wireguard
- Logs for 90days
- IPS/IDS SIEM monitoring?
- find Glenn security docs
- r2: 163OxxEi2
Network
- Add compression and headers to all Caddy servers
- k8s: nginx to apply certs for internal traffic
Wireguard - restrict the IPs and ports a client can access
- ufw or something else
Draw network diagrams
- Enable all nets from one bastion or get multiple wg if working?
- Draw Pictures: networks, k8s
- Encrypt in flight. Use NGINX like ascend pods
Monitoring
- How to tell if pod is running out of memory
Alerts
Get SMS on Prometheus working for Monitoring alerts
- No use slack instead
- Sending Email and SMS
- Requirement: Able to send SMS
- Twilio, MailChimp, …
Add reputation measures and permission for prod to send SES
- Dedicated IP?
- Send Server down alerts to Slack
- x-prod eks: Add Slack notification for deployments
- Send SMS. try sandbox now.
- monitor: Add alert for wg servers
- monitor: Add alert for dnsmasq on wg servers
Send Text on Monitoring Alerts
- https://prometheus.io/docs/alerting/latest/configuration/
- See <sns_config>
- https://prometheus.io/docs/alerting/latest/configuration/
Commercial Offerings
Sentry
- What is sentry-cli?
- Add JIRA connectivity
- Cost?
Atera
NewRelic
- Do we need this?
- Enable more NewRelic features like apache
- Pick provider
- AppDynamics, NewRelic, Elastic, ...
Draw pics
- Data Flow
- Network
- Monitoring
Database
- lock monitoring?
Turn more Linux auditing on
- send syslog to remote box
Lock or limits on cron jobs
- Create a python sub-process watcher/limiter
Limit memory for SqlServer docker and dotnet services
- mem_limit - docker-compose version 2
docker: Add health check to all containers
Logging
- Loki, Fluentd or some other logging solution
- Loki drivers
k8s: Log Collection
Build, CI/CD
Improve pipelines with conditions
Learn google CloudBuild
Review list of providers on terraform site
Data / Backups / DR
- Garbage collect backups
- Update Disaster Recovery Documents
- Perform a recovery
Data Management
- Make drawings
- MindMap of all data models
- Backup Grafana settings
- Copy Moodle prod db to dev
- Need util to make db and schema copying easy
Verify all data backups
Check all backup data
- Create a wiki page of all
- SqlServer
- Files in App_Data
- Firebase users and ...
- Firestore
- Check Access for S3 and other
Encrypt all volumes
- Clean up old unencrypted volumes/snaps
- Copy the db replication setup from new rds x-prod-2-read
Billing
- check billing everywhere at least monthly
- Create a list of links
Update EKS clusters
Future
- Setup ansible awx or semaphore on prod cluster or instance
- Webhook / commit driven gitops
- flux, atlantis, awx, semaphore, what to use?
- flux terraform controller
Roadmap Page
- Need to push for workflow defining what needs to happen.
Single sheet of DevOps Status
- RAG list (Red, Amber, Green)
FARM developer setup
- solr, moodle, redis, …
- Code Style Checker
Develop Web Hooks to build and deploy
- List open source we use in our apps like chrome://credits/
- AWS storage costs. Get rid of images.
Solr Volume
- needs a alert or number of items
- monitor
Need to Fix bitbucket pipeline pausing on high freq commits
- if not head-of-tree, push a commit in the one that is building.
k8s check role
# remove from node role xx-prod-cert-manager eks-efs s3-email-prod s3-email-stage # terminal in pod pip install awscli /home/django/.local/bin/aws sts get-caller-identity # using node role { "UserId": "AROAxVC454:i-041xx5e2833c05f9ad8", "Account": "2623x809", "Arn": "arn:aws:sts::262x09:assumed-role/eksctl-x-prod-us-west-2-nodeg-NodeInstanceRole-1AP5xWB/i-0415x9ad8" } # using service account kubectl describe pod -n cert-manager |less Environment: POD_NAMESPACE: cert-manager (v1:metadata.namespace) AWS_STS_REGIONAL_ENDPOINTS: regional AWS_DEFAULT_REGION: us-west-2 AWS_REGION: us-west-2 AWS_ROLE_ARN: arn:aws:iam::26x809:role/cert-manager AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
k8s: x-prod upgrade
- Fix karpenter
- Need to construct a eksctl config file of current cluster
k8s: Bring in ascend
- Use kompose to capture current deployments
- SMS events
- mooogggle events
k8s: Add liveness and readiness probes to all services
livenessProbe: httpGet: path: /-/healthy port: 9090 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 10 periodSeconds: 15 successThreshold: 1 failureThreshold: 3 readinessProbe: httpGet: path: /-/ready port: 9090 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 4 periodSeconds: 5 successThreshold: 1 failureThreshold: 3
k8s: create x-dev k3s cluster and recipe for developers
- terraform / ansible instance with k3s
- Route53
framework to d alter scripts
- ansible?
Create Slack app for notifications
How do I free up disk space
sudo docker container prune -f && sudo docker image prune -f sudo apt-get clean sudo apt-get autoremove dpkg --get-selections | grep linux-image1 alias ducks='du -cks * | sort -rn | head'
Tell the story…
- Pictures, Stats, …
Update all Atlassian runner images
Add Longhorn to k3s?
- apt install jq nfs-common open-iscsi
Temp EKS cluster to gauge cost
Add ECR https://github.com/upmc-enterprises/registry-creds for getting token to k8s
- or setup private repo?
Growing Disks
alias dus='du --exclude=efs -ms * 2>/dev/null|sort -n|tail' # fui /var/lib/docker/volumes 27119 farm-services_database-data # fui-stage full disk /var/lib/docker/overlay2 2088 635ddefb71040361a29c81dc71388f78fb91a1ff9a354d463742435ed27605f1 3037 0cbb349ad2d905748a9e47bb2e5062d003e5ae207109e358a5cb783674137697 3037 54d4a68fe04e94f43695f721eaabf4c5fe0d018e4cf5aa72f81848e2b745752e
Disaster/Recovery (DR)
Code
- Replicate repos from BitBucket to ?
- Replicate repos from GitHub to ?
- Create tarballs to archive alongside data archives
Configs
- Capture all env
- cron to export to ?
- format of capture files
Data
- Cron to export all data
- to where?
- copies?
- age data backups out and remove
- backup and restore tools
- export/import firestore index
DNS
GoDaddy
- cron to constantly save
Route53 in all aws accounts
Cloud Providers
- Should all be captured in DevOps code
- Need to cleanup and organize
- Tools to ETL the projects
AWS
- export all accounts
Google
- export all projects
Postgres Optimizations
- cache hit ratios
- Would moving to RDS help?
Base OS updates
- Need daily email a summary from each system
- node_exporter has apt updates count
- monitoring: detect outage of docker or k8s services
setup dns server (named,bind9) on bastions
- Use bind9
sudo apt update sudo apt install bind9 bind9utils bind9-doc -y vi /etc/bind/named.conf.options options { directory "/var/cache/bind"; // Replace with your upstream DNS forwarders { 8.8.8.8; 1.1.1.1; }; dnssec-validation auto; listen-on-v6 { any; }; allow-query { any; }; }; vi /etc/bind/named.conf.local zone "example.com" { type master; file "/etc/bind/zones/db.example.com"; }; sudo mkdir -p /etc/bind/zones sudo nano /etc/bind/zones/db.example.com $TTL 604800 @ IN SOA ns1.example.com. admin.example.com. ( 3 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL ; Name servers @ IN NS ns1.example.com. ; A records ns1 IN A 192.168.1.10 www IN A 192.168.1.11 sudo named-checkconf sudo named-checkzone example.com /etc/bind/zones/db.example.com sudo systemctl restart bind9 sudo systemctl enable bind9 # config nic sudo nano /etc/netplan/01-netcfg.yaml network: version: 2 ethernets: eth0: dhcp4: no addresses: [192.168.1.10/24] gateway4: 192.168.1.1 nameservers: addresses: [192.168.1.10] sudo netplan apply # testing dig @192.168.1.10 example.com
- NetBird WireGuard - use ai to get it setup
