Senior MLOps Engineer

Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

About the client:

Our client is a large-scale financial project with an extensive network and a strong market presence. The company manages vast amounts of financial data and is focused on enhancing its data infrastructure to support innovative solutions. With a commitment to long-term development, the team works on complex, high-impact projects — from optimizing data pipelines to implementing modern technologies across both cloud and on-premises environments. This is an opportunity to join a dynamic, data-driven company that values technical expertise and encourages growth.

About the role:
We are looking for a talented and passionate MLOps Engineer to design and implement scalable infrastructure to support a growing Data Science team. The ideal candidate will build and maintain robust systems for storing and versioning models, artifacts, and datasets, as well as provide an efficient cloud / on-premises infrastructure for ML processes.

Requirements:

– 3+ years of experience in DevOps/MLOps/Cloud Engineering.
– 3+ years of experience with on-premises ML infrastructure.
– Deep understanding of the AWS ecosystem.
– Proven experience developing scalable systems with hybrid infrastructure.
– Cloud & Infrastructure: Experience with AWS (ECS/EKS, S3, Lambda, SageMaker), Terraform/CloudFormation. Cloud resource optimization.
– ML Operations: Model versioning, ML registries, monitoring, model optimization (preferably with PyTorch).
– Development & Automation: Python, Docker, CI/CD for ML, automation, monitoring, and logging.
– Security & Compliance: Cloud infrastructure security, IAM, and ML security best practices.

Nice to have:

– AWS certifications (Solutions Architect, DevOps Engineer).
– Experience with distributed training systems.
– Ability to mentor and develop a team.
– Strategic thinking for infrastructure development planning.
– Excellent communication skills for working with diverse teams.
– Proactive approach to problem-solving.
– Ability to balance development speed and quality of solutions.

Key performance indicators:

– Successful implementation and support of ML infrastructure.
– Optimization of cloud resources and associated costs.
– Improvement of ML model deployment processes.
– Ensuring high availability and reliability of systems.
– Effective knowledge transfer to team members.

Responsibilities:

– Design and implementation of infrastructure for scaling machine learning processes.
– Development and support of ML artifact management systems (models, datasets, experiments).
– Configuration and optimization of cloud infrastructure for ML processes.
– Collaboration with Data Engineer to implement and optimize feature store.
– Ensuring security and reliability of ML infrastructure.
– Participation in the development of a strategy for the development of the team’s technical infrastructure.

The company offers:

– Remote work.
– Close cooperation with Data Science and Data Engineering teams.
– Participation in the formation of technical strategy.
– The opportunity to influence the development of ML infrastructure.

About


Apply vacancy