Jonathan Reyes - Innovation Engineering Leader
Senior Principal
Engineer
Establishing systems of innovation
Empowering human talent
Constantly learning...

About
Senior Principal Engineer, leading transformative infrastructure and platform initiatives that enable rapid, reliable software delivery at scale.
With a foundation in innovation and entrepreneurship since 2005, I specialize in leading and innovating the way to resilient, impactful systems that bridge the gap between cutting-edge technology and business value.
Core Expertise
- Innovation
- Cross-functional Empowerment
- Platform Engineering
- Kubernetes & Cloud
- Event Architecture
- Developer Experience
- Microservices
Certifications
- AWS Solutions Architect
- AWS ML Specialty
- Certified ScrumMaster
Languages
- English Native
- Arabic Professional
Experience
Most Recent Technical Initiatives
01Enterprise AI Platform Architecture & Governance Program
Enterprise AI Platform Architecture & Governance Program
Context
Company needed scalable AI capabilities across 10+ teams while managing compliance, costs, security, and provider lock-in risks as AI adoption accelerated.
Action
Architected enterprise AI platform using Genkit framework with standardized agent lifecycle management. Deployed centralized AI Hub gateway handling 70K+ daily interactions with intelligent routing and fallbacks to various Bedrock models. Implemented comprehensive observability pipeline, guardrails for PII/toxicity, and policy enforcement. Worked with leadership to established governance board, ethics committee, and working groups for ongoing cross-functional knowledgesharing.
Result
Delivered 30x faster AI iteration. Observability surfaced prompt injection on existing initiatives and PII for immediate remediation. Internal teams have built 100+ efficiency/productivity workflows with approved models and guardrails.
Technologies
02Company-Wide AI Innovation Program
Company-Wide AI Innovation Program
Context
Business teams blocked on engineering for AI and automation, creating bottlenecks in discovery, POC creation, and non-technical teams were severly overloaded with manual, but automatable, tasks consuming 20-40% of their week. Only 5% of company, mostly in engineering, leveraging AI capabilities despite competitive pressure.
Action
Designed and launched comprehensive AI enablement program reaching 130+ employees across 15+ technical and non-technical teams. Deployed N8N for no-code AI automation with several custom nodes for internal enablement. Integrated with centralized AI Hub for governance and cost control. Applied diffusion of innovation model for training department champions and spreading knowledge. Created self-serve documentation, videos, playbooks, and conducted 20+ training sessions. Worked with senior leadership to create and communicate the vision.
Result
Increased AI adoption from 5% to 90% of company within 3 months. Reduced AI implementation timelines from 2 months to 1 day. Initial rollout immediately escalated prompt injection and bias evals giving us clear insights into systems. Product teams immediately able to capitalize on this infrastructure for natural-language initiatives. Legal team has confidence in our governance strategy for detection and escalation of issues. Non-technical teams shipped 30+ AI workflows independently. Company will have saved $4M with efficiency gains through initial automation workflows. Created repeatable model now used for other technology rollouts.
Technologies
03Omen: Code Analysis CLI for AI-Assisted Development
Omen: Code Analysis CLI for AI-Assisted Development
Context
Tech debt backlogs that would take months to clear with a team of 30 devs, but no way to prioritize what actually mattered. PRs slipped through fast reviews in complex areas. Developers new to parts of the codebase didn't understand the nuances. Every change looked the same on the surface, but some touched high-churn, high-complexity code where bugs hide. AI tools could see the code, but not the weight behind it - that knot in your stomach when you open a god object knowing one wrong move could break everything.
Action
Built an open-source code analysis CLI in Go that surfaces the 'omens' - complexity hotspots, defect-prone files, architectural coupling, and technical debt. Started from PMAT/PAIML but rebuilt it for fast installs via Homebrew, cross-platform support, and seamless LLM integration. Added MCP server integration so Claude, Cursor, and Codex can see where the landmines are before writing code. Built quality gates via githooks that force AI to be more careful when touching risky areas. Supports 12+ languages through tree-sitter parsing.
Result
Refactored a large Rails API monolith - complexity dropped from 30 to under 12 across the codebase in 2 hours, zero regressions. Upgraded an Angular app from version 3 to 13 in 2 hours with significant code improvements and no regressions. Historical PR analysis proved the defect prediction works - flagging commits that would have caused production issues. Teams can now track codebase health trends over time and enforce quality standards beyond just test coverage.
Technologies
Links
04AI-Powered Innovation & Ideation Platform
AI-Powered Innovation & Ideation Platform
Context
Company suffering from 45-day define & design phase cycles in SDLC, creating significant bottlenecks in feature development. Ideas from across the organization couldn't reach the intake board, would take too long to vet, or couldn't be represented well because the submitter didn't have time to flesh out an idea. This limited innovation to a small group and missing valuable insights from customer-facing teams. Traditional define & design process required extensive documentation and multiple stakeholder meetings before ideas could even be considered.
Action
Architected and launched company-wide innovation platform based on Amazon's PRFAQ (Press Release/FAQ) methodology. Built AI-powered assistants to help employees articulate ideas regardless of technical expertise, handling formatting, language refinement, and stakeholder-specific framing. Implemented intelligent idea routing to relevant teams and automated initial feasibility assessments. Can integrate platform with downstream systems via MCP servers, API endpoints, and vector databases for seamless handoff to development teams and/or agents.
Result
Increased the idea submission pool to the entire company. Increased the quality of thought-out submissions (each submission was empowered by AI to improve understandability to different contexts). Reduced idea definition & design time from 45 days to 1 hour (99.9% improvement). Now there exists a centralized idea repository that can also assist with future ideation.
Technologies
05Monolith to Microservices Transformation (2-Year Platform Evolution)
Monolith to Microservices Transformation (2-Year Platform Evolution)
Context
Monolithic architecture causing 45-60 minute deploy times, blocking multiple teams who were colliding on each other's code changes. Test suites taking hours to run. Production elasticity issues where traffic spikes weren't detected quickly enough by HPAs, causing availability interruptions and customer impact. Teams did not have any microservice experience. Teams unable to work independently, creating bottlenecks across the organization.
Action
Led 2-year microservices adoption strategy coordinating across data, product, engineering, QA, and DevOps teams. Established Kubernetes platform with automated infrastructure provisioning, deployment pipelines, node group strategies, security protocols, service mesh for internal traffic controls. Deployed GraphQL federation to create frontend-backend contracts enabling decoupled development. Implemented event-driven architecture for data streaming between services. Introduced architectural quanta concepts and trained teams on domain/data decomposition, fracture planes, transactional sagas, and service sizing strategies. Built developer CLI tooling for service scaffolding and operations. Established service ownership paradigms with uptime responsibilities. Migrated to a schema registry of Protocol Buffers for backwards-compatible service contracts in gRPC and through Kafka.
Result
Reduced deployment times from 45-60 minutes to 2-4 minutes (95% improvement). Successfully decomposed monolith into 40+ production microservices over 2 years. Eliminated team blocking with independent service deployments. Improved system elasticity with granular scaling per service. Achieved 99.9% uptime through service-level ownership and monitoring. Enabled parallel development across all teams. Reduced the blast radius of the application. Introduced type-safey with golang and distroless container security to our infrastructure.
Technologies
06Data Explorer Slackbot
Data Explorer Slackbot
Context
Getting business insights required knowing SQL, understanding complex data schemas, and having the time to write queries. With 120+ people across the company needing answers - driver experience teams checking market SLAs, executives looking at customer trends - everyone had questions but few could answer them. The data team was swamped with tickets, and one-off questions got ignored because they couldn't be prioritized over product work already in the backlog. Wait times stretched into weeks, so product decisions got made without the numbers. In a culture that required data-driven evidence for every decision, this bottleneck was slowing everything down. By the time data came back, sunk cost fallacy had already kicked in.
Action
I built a Slackbot that opened up data access to everyone in the company. When AWS Knowledgebases turned out to be too limited and buggy, I built custom MCP servers in Go - a choice I stand by for its type safety and performance. The system pulls from our data warehouse plus real-time clickstream and usage data, all running through Genkit for local development, evals, and A/B testing. I worked through the full spectrum of LLM challenges: guardrails and gateways, infinite loops, long response times, tool failures, parsing issues, graceful degradation, retries, MCP connectivity and session management, context engineering, persistent conversations, and prompting techniques like ReAct. I designed it to be modular so other internal automation and API gateways could use the same infrastructure.
Result
Questions that took weeks now get answered in seconds. The bot handles hundreds of conversations a day, generates charts, and goes beyond the initial ask - exploring interesting paths that surface deeper insights and prompting follow-up questions to help people who wouldn't already be thinking about what to ask next. The whole company now asks questions directly while the data team maintains evals and monitors for drift. They shifted from ticket triage to high-value work like machine learning - things they'd much rather be doing. It took longer to build the right way, but the result is faster iterations, automated tooling, and continuous improvement.
Technologies
07Cloud-Native Platform Transformation
Cloud-Native Platform Transformation
Context
Company operating with manual deployments across 3 environments, experiencing 4+ hour recovery times, 1 hour build times, and 30% deployment failure rate impacting 60+ engineers.
Action
Led migration and creation of 40+ services to Kubernetes orchestration with Helm charts and ArgoCD GitOps. Architected multi-cluster strategy with Istio service mesh for traffic management. Implemented comprehensive observability stack with Prometheus/OTEL metrics, Jaeger distributed tracing, and standardized health checking. Enabled self-service deployments with ArgoCD SSO/RBAC.
Result
Reduced deployment failures from 30% to <2% and MTTR from 4 hours to <10 minutes. Decreased build time by 60%. Saved $500K annually through improved resource utilization.
Technologies
08Developer Productivity Platform Tooling (Go CLI with Plugin Architecture)
Developer Productivity Platform Tooling (Go CLI with Plugin Architecture)
Context
70+ developers losing 5+ hours/week to inconsistent tooling, slow debugging cycles, and 7-day onboarding process hampering growth, productivity, and significantly increasing cognitive load.
Action
Architected and built extensible Go-based DevEx CLI with plugin SDK supporting 50+ commands. Implemented workstation bootstrapping, multi-environment cloud management tooling, code artifact tooling, and workstation setup tooling. Created versioned release pipeline via S3 with auto-update mechanism. Established plugin marketplace with templates enabling teams to contribute domain-specific tools.
Result
Reduced onboarding from 7 days to 10 minutes. Saved $500k+ by standardizing workstation tooling, access, and upgrade process. Achieved 100% adoption across engineering with over 80% using it daily. Plugin ecosystem grew to 20+ team-contributed extensions.
Technologies
09Unified Identity & Authentication Platform
Unified Identity & Authentication Platform
Context
Monolithic authentication preventing accelerated microservice adoption. Deprecated methods of authenication that needed to be removed. Fragmented authentication across web and mobile causing security vulnerabilities and preventing enterprise SSO deals worth $5M+ annually.
Action
Built centralized OAuth2 identity service handling all OAuth, Password, OTP, SSO / SCIM workloads. Integrated Ory Hydra for token management and WorkOS for enterprise SSO/SCIM. Implemented secure session management with encrypted tokens. Added i18n support for global expansion. Orchestrated zero-downtime migration using feature flags.
Result
Unlocked $5M in enterprise deals requiring SSO. Greatly improved our security posture with the latest authentication methods and best practices. Enabled auto-provisioning and onboarding of enterprise organizations. Enabled single sign-on across all company properties. Decreased auth implementation time for new services from weeks to hours.
Technologies
10Unified Observability & Analytics Platform
Unified Observability & Analytics Platform
Context
12 different tracking and monitoring tools scattered across teams. Logs in Datadog, infrastructure metrics in AWS, traces in Grafana/Tempo, application metrics in Grafana/Prometheus, plus overlapping analytics and session recording tools per team. Pages were bloated with redundant scripts, we were double-paying for the same data, and during incidents no one had a complete picture because visibility was fragmented across a dozen dashboards.
Action
Consolidated the entire observability stack down to two platforms: PostHog for user analytics with flexible dashboards and webhook integrations, and Coralogix for infrastructure visibility covering traces, metrics, logs, security monitoring, and OpenTelemetry instrumentation across both application and AI workloads.
Result
Reduced tooling costs by 20% and cut inter-AZ replication spend by 60% by switching from collect-process-duplicate to collect-and-send. Eliminated significant engineering hours previously spent maintaining self-hosted monitoring infrastructure. Product POC automation dropped from 2-3 weeks to 1 hour through PostHog's integration capabilities. Teams now have unified visibility instead of siloed dashboards.
Technologies
11Implementation of a Product Operating System
Implementation of a Product Operating System
Context
Product and growth teams making decisions with limited data, running untracked experiments, inability to track success metrics with features being rolled out, and taking 6+ weeks to validate hypotheses, losing competitive advantage. Siloed teams (sales, marketing, product) working from different datasets.
Action
Implemented comprehensive Product OS integrating analytics, experimentation, and AB Testing / Recording across web and mobile. Deployed PostHog for product analytics with custom dashboards and cohort exports. Trained others on self-service experimentation platform with feature flags and A/B tests. Integrated data pipelines / webhooks with custom workflows.
Result
Reduced hypothesis validation from 4 weeks to 1 day. Improved feature auditiability and visibility by having a centralized dashboard and audit trail. Enabled cross-functional collaboration between marketing, product, and sales.
Technologies
12On-Demand Environment Platform
On-Demand Environment Platform
Context
Shared staging environment causing daily conflicts between 15+ teams, challenging data consistency, feature flag collisions, blocking releases and creating several-day testing bottlenecks for critical features and decreased confidence in work being rolled out.
Action
Architected self-service platform for ephemeral EKS environments provisioned via CI/CD. Managed costs by adopting a spot-instance strategy and sleeping environment strategy, Implemented dynamic DNS, ingress routing, and ArgoCD application filtering. Built webhook broadcaster for external event mirroring across environments. Added intelligent resource management with auto-teardown and cost controls.
Result
Any engineer can run a single-line command to create an isolated environemnt with production-like data in 2 minutes. Eliminated environment conflicts saving 30+ hours weekly per team. Reduced feature testing time from 3 days to 2 hours. Enabled parallel testing of 20+ features simultaneously.
Technologies
13Database Platform Modernization (Neon Branching + Zero-Downtime Migration)
Database Platform Modernization (Neon Branching + Zero-Downtime Migration)
Context
Numerous incidents regarding elasticity of primary db cluster. 10x overprovisioning to account for burst traffic. Inability to properly test features due to the lack of production-like data resulting in $50k+ level incidents every few months.
Action
Led the vetting and adoption of Neon's branching database technology for instant production-like environments. Architected streaming RDS↔Neon replication for zero-downtime migration. Optimized connection pooling reducing overhead by 60%. Implemented automated backup strategy with point-in-time recovery. Developed an internal branch kubernetes operator to enable teams to manage branches with CRDs in microservices. Led the adoption of VPCE to reduce ingress/egress costs and enhance security.
Result
Reduced database experiment time from days to seconds. Feature testability with production-like data now exists across all environments. Over 40 clusters have point-in-time recovery and instant environment creation.
Technologies
Contact
Colorado, USA
@ Dispatch
© 2025 Jonathan Reyes.
Wanting to make a positive impact on the world.