Role: AWS Data Architect SME
Location:-Remote
Duration: 12+ Month
Note:- Candidate should have Hands on experience in Databricks +AWS, Data Modeling & Design, PySpark Scripts, SQL Knowledge, Unity Catalog and Security Design, Identity federation, Auditing and Observability system tables/API/external tools, Access control / Governance in UC, External locations & storage credentials, Personal tokens & service principals, Metastore & unity catalog concepts, Interactive vs production workflows, Policies & entitlements, Compute types (incl. UC & non UC, scaling, optimization)
Data Strategy & Architecture Development
Mandatory Key Skills:
- Databricks & Spark Expertise Strong knowledge of Databricks Lakehouse architecture (Delta Lake, Unity Catalog, Photon Engine).
· Expertise in Apache Spark (PySpark, Scala, SQL) for large-scale data processing.
· Experience with Databricks SQL and Delta Live Tables (DLT) for real-time and batch processing.
· Understanding of Databricks Workflows, Job Clusters, and Task Orchestration.
- Cloud & Infrastructure Knowledge Hands-on experience with Databricks on AWS, Azure, or GCP (preferred AWS Databricks).
· Strong understanding of cloud storage (ADLS, S3, GCS) and cloud networking (VPC, IAM, Private Link).
· Experience with Infrastructure as Code (Terraform, ARM, CloudFormation) for Databricks setup.
- Data Modeling & Architecture Expertise in data modeling (Dimensional, Star Schema, Snowflake, Data Vault).
· Experience with Lakehouse, Data Mesh, and Data Fabric architectures.
· Knowledge of data partitioning, indexing, caching, and query optimization.
- ETL/ELT & Data Integration Experience designing scalable ETL/ELT pipelines using Databricks, Informatica, MuleSoft, or Apache NiFi.
· Strong knowledge of batch and streaming ingestion (Kafka, Kinesis, Event Hubs, Auto Loader).
· Expertise in Delta Lake & Change Data Capture (CDC) for real-time updates.
- Data Governance & Security Deep understanding of Unity Catalog, RBAC, and ABAC for data access control.
· Experience with data lineage, metadata management, and compliance (HIPAA, GDPR, SOC 2).
· Strong skills in data encryption, masking, and role-based access control (RBAC).
- Performance Optimization & Cost Management Ability to optimize Databricks clusters (DBU usage, Auto Scaling, Photon Engine) for cost efficiency.
· Knowledge of query tuning, caching, and performance profiling.
· Experience monitoring Databricks job performance using Ganglia, CloudWatch, or Azure Monitor.
- AI/ML & Advanced Analytics
· Experience integrating Databricks ML flow for model tracking and deployment.
Knowledge of AI-driven analytics, Genomics, and Drug Discovery in life sciences
Awaiting your quick response. Thanks!
Ankit Jaiswal
Empower Professionals
……………………………………………………………………………………………………………………..
Ankit@empowerprofessionals.com