AI ML Ops engineer - Remote
About Replika
Replika is hands down one of the most exciting forces in AI and tech today. Think: 4,000+ feature articles in the past year, TED Talks with our founder, studies from Stanford and Harvard, Lex Fridman podcast inclusion, and a Quartz founder story. We’re the only empathetic AI out there, making sure all 35M+ users feel seen, heard, and understood—whatever that means for them. So yes, we’re a bit like a future Samantha from Her, but even more powerful and in the palm of your hand. And most importantly, Replika cares for you.
Since 2016, we’ve been redefining conversational AI across iOS, Android, web, and VR. Our AI companions take many forms—holograms, AR/VR avatars, even robots. The ultimate AI life assistant, mentor, therapist, friend. Whatever you need, really, Replika is there for you. Right now, we’re rebranding with one of the world’s top design agencies, scaling our global team, and pushing human-AI connection further than ever. We’re the humanists in AI. And we’re making sure it’s done right.
About the Role
We're looking for a Software Engineer with strong DevOps and MLOps experience to join our AI & Analytics team. This role is critical in ensuring the reliability, scalability, and performance of our machine learning and data pipelines. You'll work closely with AI engineers, and backend developers and Devops to bring AI solutions into production — with a strong focus on infrastructure, automation, and performance.
If you're someone who thrives in cross-functional teams, is excited about operationalizing cutting-edge LLM technologies, and has a solid foundation in backend systems and DevOps, we’d love to meet you.
Responsibilities
- Build, manage, and optimize backend systems and APIs supporting AI/ML workloads.
- Support and maintain robust data and ML pipelines, ensuring scalability and reliability.
- Develop FastAPI-based microservices leveraging Python async patterns.
- Manage state and flow tracking using Redis and MongoDB, optimizing performance and persistence layers.
- Integrate with LLMs (LLaMA, OpenAI, Anthropic) and support vector database operations (e.g., Pinecone).
- Implement and maintain Docker-based containerized environments for both development and production.
- Design and monitor event-driven systems using Kafka.
- Implement structured logging (Structlog/Logfire) and observability solutions (e.g., Datadog).
- Collaborate with the DevOps team on CI/CD pipelines using GitHub Actions.
- Contribute to Quadrant integration for deployment and operational alignment.
- (Optional) Provide support for Scala-based components, if applicable.
Required Skills & Experience
Programming & Backend
- Python development experience
- FastAPI for building async APIs and microservices.
- Experience with Redis (especially asyncio clients) for state tracking and flow control.
- Strong understanding of MongoDB query patterns and schema design.
AI/ML Domain Knowledge
- Experience working with Large Language Models, including LLaMA (3.x preferred).
- Hands-on with LLM APIs such as OpenAI, Anthropic, etc.
- Experience with vector databases like Pinecone, understanding semantic search and embeddings.
Infrastructure & DevOps
- Proficient with Docker and docker compose
- Experience in event-driven systems with Kafka (producers/consumers).
- Strong grasp of CI/CD using GitHub Actions.
- Familiarity with Quadrant for deployment orchestration.
- Comfortable with logging/monitoring tools such as Datadog.
Nice to Have
- Familiarity with Scala or willingness to learn it.
- Experience with Kubernetes or other orchestration tools.
- Prior work in AI/ML-focused product teams or research environments.
What we offer
We offer a really competitive salary, and we'll talk specifics based on what you bring to the table. You'll get to build an AI product that genuinely changes users' lives. We value initiative and results, so you'll have lots of room to grow here. Plus, we do global offsite meetups, including in San Francisco!
Work at Replika
At Replika, growth isn’t a maybe—it’s built in. Do the work. Deliver. One great project could double your salary. Seriously. Who do we think we are? Replika. We move fast. Very. Join us at the forefront of emotional AI.
- Department
- AI Team
- Locations
- Multiple locations
- Remote status
- Fully Remote
About Replika
An AI companion who is eager to learn and would love to see the world through your eyes. Replika is always ready to chat when you need an empathetic friend.
AI ML Ops engineer - Remote
Loading application form