The internet’s evolution has progressed through distinct phases. Cisco became synonymous with the internet itself, providing the foundational infrastructure that enabled client-server architectures and established the backbone of global connectivity. Subsequently, API gateways and platforms like MuleSoft facilitated human-to-web commerce. We are now entering a phase defined by autonomous agent-to-agent communication, a fundamental shift towards interconnected intelligent systems.
Just as Cisco and API gateway technologies defined their respective eras, Tetrate is now defining the next generation of the internet. We are building the foundational platform that enables secure, scalable, and efficient communication between Large Language Models (LLMs) and autonomous agents. By architecting the control plane for these complex interactions, we are establishing the core infrastructure for the next phase of interconnected AI systems.
We are seeking a Technical Lead to drive the development and optimization of our multi-cluster control plane. This role is crucial for ensuring the reliability and performance of our service mesh solutions, specifically Tetrate Service Bridge (TSB) and Tetrate Istio Subscription Plus (TIS+). You will contribute directly to the engineering challenges inherent in building distributed systems that support the next wave of AI-powered applications.
Responsibilities:
- Lead the architecture, design, and development of service mesh products, ensuring scalability, security, and reliability.
- Collaborate closely with product management, UX, and cross-functional teams to define technical roadmaps and feature priorities.
- Mentor and guide a team of engineers, fostering a culture of excellence, innovation, and continuous improvement.
- Drive the adoption of best practices in software development, including performance optimization, CI/CD, security, and observability.
- Engage with the open-source community and contribute to projects like Istio, Envoy, and Kubernetes to enhance our ecosystem.
- Troubleshoot complex production issues, optimize system performance, and drive root cause analysis for distributed systems.
Required Skills:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 10+ years of experience in software engineering, with at least 3+ years in a technical leadership role.
- Expertise in Golang (preferred) or C++, with strong systems-level programming experience.
- Deep understanding of Kubernetes, Istio, Envoy, and modern service mesh architectures.
- Experience with multi-cluster networking, security (mTLS, RBAC, Zero Trust), and control plane/data plane architectures.
- Strong knowledge of gRPC, HTTP/2, WebAssembly (WASM), and API Gateway technologies.
- Expertise in cloud platforms (AWS, GCP, Azure), Linux networking, eBPF, and performance tuning.
- Proven ability to lead and deliver complex distributed systems in high-scale production environments.
- Experience with performance profiling and optimization of high-throughput distributed systems, leveraging tools like pprof, and eBPF tracing.
- Experience with declarative infrastructure and GitOps practices, using Terraform, Helm, ArgoCD, or Flux.
- Strong fundamentals-based problem-solving skills; ability to break down complex problems using first-principles thinking rather than conventional approaches.
- Passion for mentoring engineers, driving technical excellence, and fostering a high-performance engineering culture.
- Ego-free collaboration; ability to contribute across different areas outside of direct expertise.
- Demonstrate bias-to-action and avoid analysis-paralysis; Drive action to the finish line with high quality and on time
- You are ego-less when searching for the best ideas; You contribute effectively outside of your specialty; You think about solving problems from the standpoint of the best outcome for the team
- Values autonomy and results over process
Preferred Skills:
- Experience contributing to open-source projects, especially in the Istio, Envoy, or Kubernetes ecosystem.
- Expertise in security best practices in service mesh environments, including SPIFFE/SPIRE, OPA, and Zero Trust architectures.
- Deep networking expertise, including L3/L4/L7 protocols, DNS, IP tables, and service-to-service communication strategies.
- Hands-on experience with WASM extensions for Envoy, enabling custom policies, logging, or traffic control mechanisms.
- Expertise in large-scale observability solutions, including Prometheus, OpenTelemetry, Jaeger, and distributed tracing methodologies.
- Understanding of data plane acceleration techniques, such as DPDK, XDP, for high-performance networking use cases.
- Background in chaos engineering and resiliency testing, using tools like Chaos Mesh, Litmus, or Gremlin.
- Strong understanding of multi-tenancy, service segmentation, and access control in Kubernetes-based environments.
Location: We are a fully distributed company with a presence in 15 countries globally, and we welcome talent from anywhere. This role specifically requires coverage of the North American time zone, so you must be willing to work within those hours.