Skip to main content
Posted 23 June, 2026

Software Engineer - Agentic Platform

Userpilot
Dublin, Dublin D01 V3P0, Ireland Full Time
Reference: 1910543583

About Userpilot

Userpilot is a leading product analytics and user engagement platform used by product teams at hundreds of companies to understand, segment, and activate their users. The product spans a performant JavaScript SDK that runs inside customers' web apps, a Chrome Extension for building in-app UI without code, and a React dashboard that handles complex real-time data, all backed by a distributed Elixir/Phoenix backend that sustains hundreds of thousands of concurrent WebSocket connections, high-throughput Kafka event ingestion, and real-time content delivery at scale.

We move fast, we ship often, and we believe the best engineers care as much about the product they're enabling as the systems and interfaces they build.

The Role

This is an AI-deep role focused on Lia, Userpilot's agent platform, the system that turns a rich product-data model into reliable, grounded, multi-turn AI experiences. The AI is the product, not just a tool you use to build it.

You'll own and elevate the agent platform: a Python service built on Microsoft Agent Framework, with hybrid retrieval over multiple tool catalogs, complex multi-step orchestration utilizing skills and sub-agents, multi-turn state and grounding, and full trace-level observability and cost accounting, all built on framework-neutral domain contracts.

This is a platform you own and push further, not just keep running. You'll contribute to architecture, raise the reliability and eval bar, and help define where a frontier agentic system goes. We hire engineers who can follow a problem wherever it leads, who know when deterministic logic or statistics beat an LLM and vice versa, and who care about the customer experience as much as the system underneath.

What You'll Work On
  • Conversational AI experiences grounded in a rich product-data model, with the tool use, retrieval, streaming, and orchestrated multi-turn grounding required to do it reliably, not just plausibly.
  • The agent runtime and orchestration itself: complex, multi-step agentic workflows, behind framework-neutral domain contracts that keep business logic portable.
  • Hybrid retrieval and tool grounding: RAG (vector + lexical) over tool catalogs assembled from multiple sources (OpenAPI specs, MCP, ...), so the agent calls the right operation with the right arguments against live customer data.
  • Packaged AI workflows that produce durable, editable, actionable outputs, not just chat answers that get lost in history.
  • The eval, observability, and cost infrastructure that makes all of this safe and economically viable in production: a multi-layer eval harness (deterministic checks plus live, judge-scored reasoning evals), end-to-end tracing, and per-call cost accounting.
  • Agent interoperability: an MCP server that exposes Userpilot's tools to external AI agents.


What You'll Do
  • Design, build, and operate the agent platform end to end, from the API surface through the runtime, tools, retrieval, persistence, and observability.
  • Build LLM/agent features that ground reliably in customer data, with the streaming, retries, evals, and graceful degradation required to hold them to a production reliability bar.
  • Pick the right tool for each signal (retrieval, deterministic logic, structured outputs, statistics, or an LLM), and combine them well.
  • Treat evals, cost-per-call, and latency as first-class. AI features that run continuously at scale have unit economics; the economics matter as much as the output.
  • Work in a spec-driven, agent-assisted flow, reading and contributing to PRDs that drive both human and AI implementation.
  • Contribute to the team's agentic infrastructure (AGENTS.md, CLAUDE.md, DESIGN.md, slash commands, architectural rules) so AI tooling understands our codebase as well as the humans do.
  • Review code for architectural consistency and reliability, including making sure agent-generated code respects the same boundaries and framework-neutral contracts that human-written code does.
  • Raise the bar around you: set the patterns, write the specs and evals others build on, and level up the engineers (and agents) working in the platform.


What We're Looking For

Required
  • 3+ years building and shipping production software, with a track record of owning systems (not just features) and raising the quality bar for the people around you.
  • Strong Python and CS fundamentals, including solid work with databases, queues, or real-time systems. The agent platform runs on Python (FastAPI, Pydantic, async), so you're fluent here or will be very quickly.
  • Production agentic / LLM systems, not just calling an API: tool use, retrieval grounding, structured outputs, multi-turn state and continuity, streaming, evals, and designing for non-deterministic behavior. Having owned an agent runtime or orchestration layer end to end is a strong signal.
  • Architectural judgment for AI systems: you keep domain logic decoupled from a fast-moving vendor framework, make build-vs-adopt calls deliberately, and know why that matters when the framework landscape shifts every quarter.
  • Judgment about when to use an LLM and when not to: you reach for deterministic logic, retrieval, or statistics when they're more reliable, cheaper, or more reproducible, and you can tell which is which.
  • AI-native workflow: you use AI coding agents (Claude Code, Cursor) as a real part of how you build, prompting for scaffolding, reviewing output critically, and knowing when to push back.
  • Strong product sense and judgment. You care about the user experience and about system correctness in equal measure.
  • Self-management and a continuous-improvement mindset. We don't over-prescribe how the work gets done.

Bonus Points
  • Experience with agent frameworks or orchestration: Microsoft Agent Framework, LangGraph, AutoGen, or a runtime you built yourself.
  • RAG and tool-use platforms (retrieval over tools and APIs, OpenAPI-driven tool generation, MCP).
  • LLM evals and observability: designing them, running them, and acting on the signal, with tracing and cost tooling like Langfuse or OpenTelemetry GenAI.
  • Cost engineering on LLM workloads (caching, batching, model routing, prompt compaction).
  • Embedding-based retrieval or clustering (vector DBs, hybrid search, HDBSCAN, UMAP, and similar).
  • Multi-tenant SaaS architecture: data isolation, per-tenant state, noisy-neighbor concerns.
  • Full-stack / core-services depth: production React/TypeScript, and/or our core stack (Elixir/Phoenix with OTP, ClickHouse, Kafka). You won't live here day to day, but it helps where the agent platform meets the rest of the product.
  • Time-series anomaly detection or drift monitoring; recommendation or ranking systems with user-feedback loops.
  • Spec-driven development, writing or working from specs that drive both human and AI implementation.
  • Contributing to developer experience or agentic infrastructure.
  • Technical leadership on an engineering team.
  • Open source contributions.


How We Build

AI is at the center of what we ship and how we ship it. A few things we believe about how the work gets done:
  • Statistics, heuristics, and LLMs each have a role. The mistake we don't want to make is asking an LLM to do anomaly detection or risk scoring directly: wrong economics, wrong reliability, wrong reproducibility. Use the LLM where it's strongest; use statistics where they're strongest; use heuristics where they're cheapest.
  • Features start with a written spec (a PRD that captures intent and constraints), not a two-line ticket, whether the implementer is a human or an agent.
  • Coding agents do the scaffolding; engineers own the architecture, the review, and the judgment calls.
  • Evals are how we ship safely. Every LLM-shaped feature gets an eval suite before it goes to production, and we look at the suite, not just whether it ran.
  • LLM calls are economics, not free. Caching, batching, model routing, and prompt compaction are first-class engineering concerns.
  • Feedback loops are how AI features get smarter. Instrument everything.
  • Our patterns are encoded explicitly. Every umbrella app and product domain has an AGENTS.md capturing what it does, the patterns it uses, and the mistakes to avoid, so an agent working on core doesn't violate a cache invariant or write directly to ClickHouse, and an agent on the dashboard doesn't break a design contract.
  • DX is a product: if a new engineer (or an AI agent) can't understand a domain from its documentation and rules, that's a bug we fix.

You don't need to have done all of this at your last job. But you should be genuinely curious about it, comfortable owning a system end to end, and excited to help shape how AI products get built here.

Right to Work

Candidates must have the right to work in Ireland. We are not in a position to offer visa sponsorship for this role.

Equal Opportunities Statement

Userpilot is an equal opportunity employer. We are committed to creating an inclusive environment for all employees and applicants. We do not discriminate on the basis of gender, civil status, family status, age, disability, race, religion, sexual orientation, or membership of the Traveller community, in accordance with the Employment Equality Acts 1998-2015.

Data Privacy Notice

By applying for this role, your personal data will be processed by Userpilot for the purposes of recruitment and candidate evaluation. We will retain your information for no longer than is necessary for this purpose.

Sign up for Job Alerts