- R&D
Software Data Engineer, Data Platform
Our mission is to provide manufacturers and other industrial sectors with insights into the health of machines, processes, and operations to transform how people work and what they can create. A leader in Machine Health and Process Health solutions, Augury uses purpose-built AI technology, trained by industry experts and the world’s largest data library, to help companies realize the full potential of their production. Together with our customers, we are pioneering Production Health by removing friction created by competing business goals so companies can improve business outcomes, empower their workforce, and achieve sustainable production- all at the same time.
About The Position
You are a Software Data Engineer with deep experience building data-intensive systems, not a traditional ETL or BI-focused Data Engineer. In this role, you will design and build production-grade data services, platforms, and pipelines that power DIH and our AI-driven products. You will combine strong software engineering fundamentals with modern data engineering practices, with a focus on clean architecture, reliability, scalability, observability, and testing.
As a Software Data Engineer, Data Platform, you will:
- Build and evolve Python-based services and pipelines that ingest raw industrial events, store them reliably, and expose clean, well-modeled tables and APIs for downstream consumers, including Digital Twin, Smart Canvas, AI agents, and analytics.
- Design systems that handle duplicates, invalid data, late-arriving events, and reprocessing in a principled, incremental, and reproducible manner.
- Collaborate with platform, machine learning, and product teams across Israel and globally to transform complex data challenges into robust, observable, and scalable software solutions.
A Day in Your Life
Production Data Systems & Pipelines
- Design and implement end-to-end data flows, from raw event ingestion into durable storage to modeled datasets and aggregates that power products, Digital Twin capabilities, analytics, and AI agents.
- Build idempotent pipelines that can safely re-run without corrupting data, using deterministic keys and clearly defined contracts between raw, curated, and modeled datasets.
- Implement incremental aggregations (e.g., machine signal summaries, production metrics, and operational KPIs) that correctly account for late-arriving data, watermarking strategies, and reproducibility requirements.
- Model relationships and context across machines, lines, factories, sensors, work orders, and operational events to support context-aware applications, knowledge graphs, and AI agents.
- Partner with platform teams to define how datasets are stored within our lakehouse, Digital Twin, and context graph architectures and exposed through well-defined APIs and tools.
Software Engineering & Data Quality
- Write clean, maintainable Python services with clear separation of concerns across ingestion, validation, transformation, persistence, aggregation, and orchestration layers.
- Apply strong data modeling and SQL fundamentals, including schema design, indexing strategies, event-time semantics, and scalable aggregation patterns.
- Drive testing discipline across the data platform, including unit tests, data-quality tests, integration tests, and validation frameworks.
- Design for observability through metrics, logging, tracing, and monitoring that simplify debugging, improve data quality visibility, and support production operations.
- Troubleshoot and resolve production data issues, including incorrect aggregations, missing data, duplicate records, schema evolution challenges, and backfill operations.
Streaming, Lakehouse & Scalability
- Build and evolve systems that scale from local development environments to cloud-scale lakehouse architectures using technologies such as Databricks, Delta Lake, and Spark.
- Design and implement data pipelines following modern lakehouse patterns, including Bronze, Silver, and Gold layers, partitioning strategies, and cost-efficient compute utilization.
- Work with streaming and messaging platforms (Kafka, Pub/Sub, or similar) to build reliable, idempotent consumers, replay capabilities, and reprocessing workflows.
- Contribute to multi-tenant data architectures, data contracts, and governance practices that enable secure and efficient access to customer data at scale.
Collaboration & AI-Native Experiences
- Work closely with DIH, Smart Canvas, and AI teams to define how agents interact with structured data, context graphs, APIs, and tools in deterministic and reliable ways.
- Translate product requirements and user needs into technical designs that balance correctness, performance, latency, cost, and long-term maintainability.
- Participate in architecture reviews, design discussions, code reviews, and collaborative development practices that raise the overall engineering bar across the organization.
- Help shape the future of AI-native experiences by building the data foundations that power intelligent applications and agentic workflows.
What You Bring
- Bachelor's degree in Computer Science, Software Engineering, Data Engineering, Information Systems, or a related engineering discipline, or equivalent practical experience.
- 5+ years of professional software engineering experience, including substantial experience building backend systems, distributed systems, or data-intensive applications in production environments.
- Strong Python engineering skills, including modular architecture, dependency management, testing practices, observability, and production-grade code quality.
- Strong SQL and data modeling expertise, including schema design, indexing strategies, event-driven data models, and scalable analytical aggregations.
- Hands-on experience building incremental and idempotent data pipelines that handle duplicate, invalid, and late-arriving events without impacting downstream consumers.
- Experience with at least one major cloud platform (Azure, GCP, or AWS) and modern lakehouse technologies such as Databricks, Delta Lake, Spark, or equivalent architectures.
- Experience with streaming or messaging technologies such as Kafka, Pub/Sub, Event Hubs, or similar event-driven systems.
- Proven ability to diagnose and resolve production data issues, including data quality problems, schema evolution, backfills, replay scenarios, and performance bottlenecks.
- Strong written and verbal communication skills in English and experience collaborating effectively with globally distributed teams.
Nice to Have
- Experience building industrial, IoT, manufacturing, or operational data platforms.
- Familiarity with Digital Twin architectures and industrial data models.
- Experience with graph databases, context graphs, knowledge graphs, or relationship-centric data modeling.
- Exposure to AI/LLM-powered applications, including retrieval-augmented generation (RAG), agents, tool calling, or evaluation frameworks.
- Experience working with Databricks or similar lakehouse platforms from both application and platform perspectives.
- Experience building data products that directly support AI agents, intelligent applications, or machine learning workflows.
Perks
- Stock options
- Paid parental leave
- Flex PTO
Augury is a people-first organization. We believe in fostering an inclusive environment in which employees feel encouraged to share their unique perspectives, leverage their strengths, and act authentically. We know that diverse teams are strong teams, and we welcome those from all backgrounds and varying experiences. We are committed to providing employees with a work environment free of discrimination and harassment. We believe that diversity is more than just good intentions, and we are committed to creating an inclusive environment for all employees.
Augury is a proud equal opportunity employer, we strive to create a work environment in which everyone, all applicants, employees, customers, guests, and vendors feel safe and comfortable. We commit to maintain a workplace that is free of any type of harassment and does not tolerate anyone intimidating, humiliating, or hurting others. We prohibit willful discrimination based on age, gender, ethnicity, race, color, religion, political opinions, sexual orientation, sexual identity or expression, military or veteran status, disability or any other characteristic protected by law.