Get to know the role:
Build and manage the data asset using some of the most scalable and resilient open source big data technologies like Airflow, Spark, Apache Atlas, Kafka, Yarn, HDFS, ElasticSearch, Presto/Dremio, HDP, Visualisation layer ,Snowflake and more.
Design and deliver the next-gen data lifecycle management suite of tools/frameworks, including ingestion and consumption on the top of the data lake to support real-time, API-based and serverless use-cases, along with batch (mini/micro) as relevant
Build and expose metadata catalogue for the Data Lake for easy exploration, profiling as well as lineage requirements
Enable Data Science teams to test and productionize various ML models, including propensity, risk and fraud models to better understand, serve and protect our customers
Lead technical discussions across the organisation through collaboration, including running RFC and architecture review sessions, tech talks on new technologies as well as retrospectives
Apply core software engineering and design concepts in creating operational as well as strategic technical roadmaps for business problems that are vague/not fully understood
Obsess security by ensuring all the components, from a platform, frameworks to the applications are fully secure and are compliant by the group’s infosec policies.
The must haves:
At least 2+ years of relevant experience in developing scalable, secured, fault tolerant, resilient & mission-critical Big Data platforms.
Able to maintain and monitor the ecosystem with 99.9999% availability
Candidates will be aligned appropriately within the organisation depending on experience and depth of knowledge.
Must have sound understanding for all Big Data components & Administration Fundamentals. Hands-on in building a complete data platform using various open source technologies.
Must have good fundamental hands-on knowledge of Linux and building a big data stack on top of AWS/Azure using Kubernetes.
Strong understanding of big data and related technologies like HDFS, Spark, Presto, Airflow, apache atlas etc.
Good knowledge of Complex Event Processing (CEP) systems like Spark Streaming, Kafka, Apache Flink, Beam etc.
Experience with NoSQL databases – KV/Document/Graph and similar
Proven Ability to contribute to the open source community and up-to-date with the latest trends in the Big Data Space.
Able to drive devops best practices like CI/CD, containerisation, blue-green