Get C2C/W2 Jobs & hotlist update

Urgent Hiring || Data Engineer- Big Data Engineer with Deploying AI || Remote

Data Engineer

Dear Vendors,

I hope this email finds you well.

Role:Data Engineer- Big Data Engineer
Location::Remote
Client:-HCL
Job description:

Job Overview:
We’re seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You’ll work with structured, semi-structured,
and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments.
Key Responsibilities:
Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.
Data TransformationValidation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.
Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.
Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.
Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.
CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.

Required Skills:
ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.
Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.
Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.
Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.
Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.
Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka.
Version Control: Git for source code management.
Desirable Skills:
Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub.
Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics.
Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.
Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.
 
Tools Technologies:
ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.
Big Data: Hadoop, Kafka, Apache Flink, Presto.
Databases: MySQL, PostgreSQL, MongoDB, Cassandra.
Cloud: AWS, GCP, Azure, Databricks.
Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS.
Version Control: Git.

Thanks & Regards

Mohd Irfan

  1. Technical Recruiter


You received this message because you are subscribed to the Google Groups “c2c urgent mail” group.
To unsubscribe from this group and stop receiving emails from it, send an email to c2curgentmail+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/c2curgentmail/CAFQAamRTPjF1kM1XrRK4Uota4thpY6jbL82BXEVrLywbTSbzZg%40mail.gmail.com.

Read more:

Daily Updated top 2000k+ VENDOR LISTS

Top 500+ USA JOBS HOTLIST

top staffing companies in usa

Updated bench sales hotlist

More Corp to corp hotlist

Join linkedin 58000+ US Active recruiters Network

Join No.1 Telegram channel for daily US JOBS and Hotlist Updated

Leave a Reply

Your email address will not be published. Required fields are marked *