209 lines
8.0 KiB
Markdown
209 lines
8.0 KiB
Markdown
+++
|
||
title = 'FireMon'
|
||
date = 2013-12-01
|
||
[extra]
|
||
title = 'Principal Engineer'
|
||
years = '2013–Present'
|
||
+++
|
||
|
||
FireMon is a software development company based in Overland Park, KS. As the
|
||
System Architect, I focus on building a scalable platform for delivering
|
||
FireMon software to customers that is easy to use. FMOS, the FireMon Operating
|
||
System, is a mechanism for delivering the FireMon <abbr title="Security
|
||
Intelligence Platform">SIP</abbr> to customers and a collection of tools for
|
||
deploying and managing the software in a wide array of environments, ranging
|
||
from a single server to massive multi-node ecosystems.
|
||
|
||
<!-- more -->
|
||
|
||
# FMOS: FireMon Operating System
|
||
|
||
## Ansible Configuration Policy
|
||
|
||
* Configuration policy for deployment of all FireMon software and
|
||
third-party dependencies
|
||
* Support for single-server and distributed deployments
|
||
* Automatically compute JVM heap sizes for each process based on available
|
||
resources
|
||
* Configures Elasticsearch in single-node or clustered mode
|
||
* Configures PostgreSQL with optional replication to standby servers
|
||
* Configures Kernel NFS server and client to share filesystem data between
|
||
machines
|
||
* Configures FireMon application server processes, including connection and
|
||
authentication information for PostgreSQL, Elasticsearch
|
||
* Configures strongSwan IPsec/IKEv2 key management daemon for opportunistic
|
||
encryption of Elasticsearch communication
|
||
* Configures operating system login, password policy, including support for
|
||
external authentication providers such as LDAP or Kerberos
|
||
* Sets up *collectd* and Carbon (Graphite data storage engine) to track
|
||
system performance metrics, optionally replicating metrics data to a
|
||
FireMon-managed central storage for real-time review
|
||
* Optionally configures *rsyslog* to send log messages to remote destinations
|
||
over UDP, TCP, or TCP+TLS
|
||
* Configures *tmux* to automatically launch at user login
|
||
|
||
|
||
## Deployment and Maintenance Tools
|
||
|
||
* Python software for configuring and managing machines running FireMon
|
||
software (`fmos` command)
|
||
* Critical functionality for application maintenance:
|
||
* Updating OS and software
|
||
* Backing up and restoring data
|
||
* Capturing diagnostic information for technical support
|
||
* Modifying configuration settings
|
||
* Managing server certificates and private keys
|
||
* D-Bus daemon to handle privileged operations
|
||
* Unprivileged command-line interface
|
||
* HTTP API developed with FastAPI
|
||
|
||
|
||
## Generation II Platform
|
||
|
||
* Based on CentOS 7
|
||
* Full-disk encryption using LUKS
|
||
* Anaconda installer with custom addon for generating machine-specific LUKS
|
||
master key passphrase
|
||
* Kickstart script for fully-automated installation
|
||
* Used Koji to build RPM packages for first- and third-party software
|
||
* Distribution included Ansible for configuration management
|
||
* systemd units for controlling FireMon application services
|
||
|
||
|
||
## Generation III Platform
|
||
|
||
* Based on CentOS 7, later CentOS 8 (Stream)
|
||
* Immutable SquashFS root filesystem image
|
||
* Full-disk encryption using LUKS
|
||
* Custom Dracut modules to verify image OpenPGP signature, mount as rootfs,
|
||
initialize LUKS-encrypted persistent data volume with LVM
|
||
* Custom SELinux policy to confine FireMon software
|
||
|
||
|
||
## Cloud-Hosted Public Services
|
||
|
||
* FMOS Support File Upload Service
|
||
* Deployed in AWS EC2 using Elastic Beanstalk
|
||
* HTTP API for resumable file uploads using a content-addressable chunked
|
||
data storage system
|
||
* Allows FireMon customers to easily upload FMOS diagnostic packages to
|
||
FireMon support for analysis
|
||
* FMOS News Service
|
||
* Deployed in AWS EC2 using Elastic Beanstalk
|
||
* HTTP API for providing important notifications (e.g. release announcements,
|
||
vulnerability disclosures, etc.) to FireMon customers through the FMOS
|
||
command-line interface
|
||
* Victoria Metrics
|
||
* Deployed to AWS EC2 using Terraform
|
||
* Clustered deployment to facilitate scalability and reliability
|
||
* Receives Prometheus metrics (via remote write protocol) from FMOS instances
|
||
deployed at customer sites and in the cloud
|
||
|
||
|
||
# DevOps Team Lead
|
||
|
||
* Exclusively managed all resources using Ansible configuration management
|
||
* Deployed and maintained hundreds of internal and cloud systems running
|
||
RHEL/CentOS Linux (5, 6, 7, 8)
|
||
* PXE provisioning of all on-premises virtual machines
|
||
* All machines Active Directory domain members using Samba/Winbind
|
||
* Zabbix system monitoring
|
||
* Agent installed on all machines
|
||
* Collects system availability and performance metrics
|
||
* Custom templates for basic application availability metrics
|
||
* Atlassian Bitbucket (Stash) Git repository host
|
||
* Jenkins continuous integration platform
|
||
* Integrated with Bitbucket for project discovery and change events
|
||
* Jobs configured using `Jenksinsfile` pipeline definition files within
|
||
repositories
|
||
* Build environments defined as container images, jobs run in Docker
|
||
containers on Jenkins agents
|
||
* Ephemeral agents using vSphere plugin, various virtual machine templates
|
||
for different project needs
|
||
* Application data backups using *BURP*: Back Up and Restore Program
|
||
* Graylog log aggregation
|
||
* All machines send system, application logs via syslog over TLS, using
|
||
*rsyslog*
|
||
* Custom pipelines for parsing and indexing fields from log messages
|
||
* Alerts based on log message contents, frequency
|
||
* Prometheus application monitoring
|
||
* Victoria Metrics time-series database
|
||
* Prometheus exporters for many applications (Jenkins, Bitbucket,
|
||
Elasticsearch, GlusterFS, HAProxy, Nginx, Redis)
|
||
* Custom Grafana dashboards for status display, performance analysis
|
||
* *collectd* monitors system performance from ephemeral Jenkins worker nodes
|
||
via multicast, exposes Prometheus metrics
|
||
* AlertManager notifications to e-mail and Slack for application availability
|
||
and performance alerts
|
||
* HashiCorp Vault HA cluster for secret storage, including Jenkins credentials
|
||
|
||
|
||
# Internal Tools
|
||
|
||
## FMOS Web Tools
|
||
|
||
* Internal application used by software developers and support agents
|
||
* Multi-tiered architecture with multiple nodes at each tier to avoid any
|
||
single point of failure
|
||
* Application Server Tier: Python 3.6/FastAPI
|
||
* Storage Tier: GlusterFS
|
||
* Index Tier: Elasticsearch
|
||
* Cache Tier: Redis
|
||
* Message Tier: RabbitMQ
|
||
* Worker Tier: Python 3.6/Celery
|
||
* Ingress: HAProxy
|
||
* User Interface: Typescript/Vue+Vuetify
|
||
|
||
|
||
## PR Bot
|
||
|
||
* Implements a web hook for Atlassian Bitbucket (stash)
|
||
* Reacts to new and updated Pull Requests
|
||
* Automatically checks Git commits and changed code to enforce style guide and
|
||
other project-specific requirements
|
||
* Adds comments to Pull Requests indicating check results, marks PR as approved
|
||
or needs work
|
||
* Written in Python, no external dependencies
|
||
|
||
|
||
## QEMU VM Log Socket Proxy
|
||
|
||
* Component of FMOS End-to-End tests running on-premises using QEMU/libvirt
|
||
* Uses kernel *inotify(7)* events to detect virtual machine log channel socket
|
||
files appearing on the VM host
|
||
* Automatically connects to sockets as they appear
|
||
* Receives all data from channel sockets and writes them to a file in the
|
||
libvirt storage pool
|
||
* Written in Rust
|
||
|
||
|
||
## FMOS ISO Writer
|
||
|
||
* Internal application used by development and QA teams to write FMOS installer
|
||
images to USB disks attached to remote physical appliances
|
||
* Accessible via purpose-built, ultra-minimal Linux distribution (Kernel and
|
||
Busybox only) delivered by network boot/PXE
|
||
* Written in Rust
|
||
|
||
|
||
## Environment Launcher
|
||
|
||
* Internal application that allows FireMon employees to launch FireMon SIP
|
||
environments quickly, as containers running in Kubernetes
|
||
* Allows users to choose specific feature branches of each front-end and
|
||
back-end component, to facilitate testing of work in progress
|
||
* Written in Rust, using the Rocket web framework
|
||
|
||
|
||
# FireMon-as-a-Service
|
||
|
||
* Cloud-hosted FireMon software deployment
|
||
* Deployed backend infrastructure for federated authentication using OpenLDAP,
|
||
MIT kerberos
|
||
* Followed Infrastructure-as-Code principles using Ansible
|
||
* Developed custom integrated authentication solution for FireMon Security
|
||
Manager software to provide full-featured account and credential management
|
||
using Kerberos protocol (Authgate)
|
||
* Python bindings for *mit-kerberos* using Cython
|
||
|