RESPONSIBILITIES:
● Develop and operate cross-company data infrastructure.
● Manage data integration and data pipelines.
● Aggregate data from various internal products, services, and
● CRM tools into a data lake.
● Appropriately distribute the data collected in the data lake to analysis and ML platforms.
● Build and maintain data analysis infrastructure.
● Optimize DWH performance.
● Ensure data quality.
REQUIREMENTS:
Must have:
1. AI software development
- Experienced developing AI algorithm or understand how it works from scratch
- Developed an Application Programming Interface as an internet service
- Operated and manipulated an API service by using an appropriate software
- Controlled and tuned accuracy through operating data
2. Data engineering
- Data Warehousing
- ETL Processes
- Cloud Platforms (AWS, Azure, GCP)
- Python
- SQL
- Big Data Technologies (Hadoop, Spark, Kafka)
- Data Modeling
- Data Quality
- Data Security
- Version Control (Git)
3. Tools
- Configuration management: Terraform
- CI/CD: GitHub Actions
- Monitoring and logging: Datadog, Cloud Monitoring, CloudWatch
- Project management: JIRA Cloud, Miro,...
- Documentation: Kibela, Google Workspace
- Spark: 2-3 years of experience
- Airflow: 1 year (As a user not administrator)
4. Cloud architecture
- Experienced developing a system by using some cloud services on AWS, GCP
or Azure(AWS is preferable).
- Understood cloud service components from architectural perspective(To know
what can do, how it works and what should be paid attention)
- Designed and integrated a system with appropriate security.