Data Engineer
Dear Vendors,
I hope this email finds you well.
Role:Data Engineer- Big Data Engineer
Location::Remote
Client:-HCL
Job description:
Job Overview:
We’re seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You’ll work with structured, semi-structured,
and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments.
Key Responsibilities:
Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.
Data TransformationValidation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.
Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.
Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.
Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.
CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.
Required Skills:
ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.
Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.
Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.
Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.
Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.
Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka.
Version Control: Git for source code management.
Desirable Skills:
Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub.
Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics.
Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.
Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.
Tools Technologies:
ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.
Big Data: Hadoop, Kafka, Apache Flink, Presto.
Databases: MySQL, PostgreSQL, MongoDB, Cassandra.
Cloud: AWS, GCP, Azure, Databricks.
Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS.
Version Control: Git.
—
Thanks & Regards
Mohd Irfan
- Technical Recruiter
—
You received this message because you are subscribed to the Google Groups “c2c urgent mail” group.
To unsubscribe from this group and stop receiving emails from it, send an email to c2curgentmail+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/c2curgentmail/CAFQAamRTPjF1kM1XrRK4Uota4thpY6jbL82BXEVrLywbTSbzZg%40mail.gmail.com.
Read more:
Daily Updated top 2000k+ VENDOR LISTS
Top 500+ USA JOBS HOTLIST
More Corp to corp hotlist
Join linkedin 58000+ US Active recruiters Network
Join No.1 Telegram channel for daily US JOBS and Hotlist Updated