Data Science UA is a service company with deep expertise in AI and Data Science. Our story started in 2016 with the first Data Science UA Conference in Kyiv, and since then, we’ve built one of the largest AI communities in Europe.
About the role and product:
We are looking for an experienced Data Engineer to join the Data &AI team delivering enterprise-grade data platform capabilities. The role centres on building Python-based ETL pipelines across a Medallion architecture, developing and operationalising data quality rules using both Python and the Informatica platform (IDQ, EDC, Axon), and ensuring robust governance integration.
Responsibilities:
– Design, build, and maintain Python-based ETL pipelines across Bronze, Silver, and Gold layers following the Medallion architecture pattern.
– Develop and operationalise data quality (DQ) rules in Python, covering validity, completeness, consistency, uniqueness, accuracy, and timeliness dimensions.
– Build and maintain DQ rules using Informatica IDQ 10.5 on-premises, including mapplet design, profile configuration, and scorecard setup.
– Support data cataloguing and lineage activities using Informatica EDC, including schema scanning, data classification, and lineage mapping.
– Configure and maintain Informatica Axon for governance workflows, business glossary management, policy enforcement, and surfacing DQ scores to stakeholders.
– Develop cross-system DQ checks including referential integrity, reconciliation, and entity deduplication as scheduled Python jobs.
– Integrate DQ outputs into Power BI dashboards, contributing to the DQ Index Aggregation layer that unifies IDQ scorecard and Python-based scores.
– Support the IDQ–Axon integration, including Axon Agent configuration and DQ metric propagation into Axon dashboards.
– Contribute to the Python DQ framework package, maintaining modular, testable, and well-documented code with comprehensive unit test coverage.
– Support Talend-based ingestion processes at the Bronze layer, ensuring schema conformance and completeness checks.
– Participate in root-cause analysis of DQ failures using the programme’s three-tier classification taxonomy.
Requirements:
– Strong Python development skills, including experience with data processing libraries (pandas, PySpark) and testing frameworks (pytest).
– Hands-on experience with Informatica IDQ 10.5 on-premises: Developer Tool, mapplet design, profiles, scorecards, and expression-based DQ rules.
– Working knowledge of Informatica EDC for schema scanning, data lineage, and data classification.
– Experience with Informatica Axon for governance, business glossary management, HITL steward workflows, and DQ score surfacing.
– Solid understanding of the Medallion architecture (Bronze / Silver / Gold) and data lake design patterns.
– Working knowledge of SQL and relational databases for data validation and reconciliation.
– Experience building production-grade Python ETL pipelines with appropriate logging, error handling, and modular design.
– Experience with CI/CD practices for data pipelines and version control (Git).
Nice to have:
– Familiarity with Talend for data ingestion workflows.
– Exposure to AI/ML techniques for data quality: anomaly detection, entity resolution, DQ score aggregation.
– Experience generating or consuming Informatica IMX-format XML programmatically.
– Knowledge of Power BI for building or maintaining DQ score dashboards.
– Understanding of data governance frameworks, stewardship models, and regulatory reporting requirements.
– Experience with the Informatica IDQ–Axon integration bridge (Axon Agent setup and configuration).
We client offers:
– Good compensation;
– Benefit package;
– Strong team and career growth;
– Challenges every day!
Know someone, who'd be a great fit for this role?
Submit your candidate via the referral program and receive a bonus for a successful recommendation!