Data Engineer
Data Engineer
Main Purpose of the Job
A Data Engineer utilizes his/her information modeling and programming skills to clean, prepare, and optimize data for consumption by Data Scientists or UX/Visual experts to derive insights. They combine cognitive computing and advanced analytics technologies, such as modifying open-source tools to incorporate cognition, with traditional data engineering and apply them to data sourced for specific engagements.
S/He is a software engineer who designs, builds, and integrates data from various sources, and writes and manages complex queries, ensuring that data is easily accessible and operates smoothly, with the goal of optimizing the performance of the company’s data ecosystem.
Potential resources should have a strong data management background, with experience handling unstructured and structured data, and the ability to transform and analyze data using various tools or scripting.
If the business case is proven for initial proof of value/proof of concept (POV/POC) engagements, s/he also collaborates with IT Solution Architects to embed the pathfinder value-generating and successful models into operations and help design them as key components in industrialized solutions.
Key Outputs
Contribution to IT Strategy by facilitating exploration through POC/POV — under the supervision and guidance of his/her primary Community of Practice Lead and Product Group Manager based in Switzerland
- Articulates a vision and roadmap for the ingestion, cleansing, staging, harmonization, and exploitation of data as a valued corporate asset, in alignment with existing functional priorities, to help Product Managers explore new ways to solve complex business problems
- Works on data requirements (provided by UX and Data Scientists) that will be used to train and develop models and algorithms to solve business challenges
- As part of a POC/POV, creates data ingestion strategies, prepares data, assists in variable creation, develops information models or data staging strategies, and performs necessary data cleansing activities
- Manages the data lifecycle during the POC/POV and starts creating strategies to embed them into an industrialized model or service operations
- Ensures data is managed in a secure and compliant way, even during the POC/POV, to avoid potential risks
- Works with lead markets, functions, and GMB/RMB to conduct the POC/POV and bring it to closure
Operational Effectiveness and Efficiency by helping industrialize proven models
- Supports product teams and Solution Architects in industrializing information models proven during the POC/POV by devising data collection procedures that include relevant information for building analytic systems
- Assists Solution Architects in developing processes and tools to continuously monitor information model performance and data accuracy
- Helps technical specialists design better descriptive and prescriptive analytics solutions by providing the foundation for semantic models that can be used to visualize information and develop reports on data analysis results to facilitate new KPI/PPI discussions
- Assists technical specialists in API/interfacing technologies to better understand how to acquire data and build ingestion layers for industrialized information models
- Promotes the use of services rather than full automation where manual intervention is more appropriate based on cost-benefit analysis
Stakeholder Engagement
- Influences information architects on what should be part of the company’s core data assets and what has repeat value
- Shares best practices with analytics and product teams and facilitates market enablement for similar initiatives
- Collaborates with stakeholders across the organization to identify opportunities to leverage company data to drive business solutions and provide data sourcing advisory
- Influences product teams, including Solution Architects, through presentations of data-based recommendations for evolving operational solutions with new and enhanced models, including effective semantic models and API connectors
- Champions best practices for data management across delivery and recipient organizations
Key Experiences
- Master’s degree in Computer Science, Engineering, or Management Information Systems
- 5+ years of experience in information modeling and data engineering
- Ability to architect highly scalable distributed systems using open-source tools and big data technologies (such as Hadoop, HBase, Spark, Impala, Storm, etc.) integrated with other open-source or proprietary tools available through the Azure Marketplace, especially Cortana Intelligence components
- Experience in cloud-based agile and DevOps environments with PaaS and IaaS
- Experience using big data batch and streaming tools
- Experience with SQL, NoSQL, relational database design (SAP HANA is a plus), efficient data retrieval methods, and data preparation/wrangling both on demand and in industrialized environments
- Ability to gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, etc.)
- Programming experience in Python, Scala, R, Java, and SQL (PowerShell and C# are an advantage)
- Experience with basic and advanced data visualization: simple displays (e.g., Hue), use of notebooks (e.g., Jupyter, Zeppelin), and building reports and dashboards (e.g., Power BI, SAP BO suite)
- Demonstrated ability to work with minimal supervision
- Strong problem-solving skills with an emphasis on product development
- Effective communication skills across different organizational levels and proficiency in English
- Experience working in a global environment and with virtual teams
Main Purpose of the Job
A Data Engineer utilizes his/her information modeling and programming skills to clean, prepare, and optimize data for consumption by Data Scientists or UX/Visual experts to derive insights. They combine cognitive computing and advanced analytics technologies, such as modifying open-source tools to incorporate cognition, with traditional data engineering and apply them to data sourced for specific engagements.
S/He is a software engineer who designs, builds, and integrates data from various sources, and writes and manages complex queries, ensuring that data is easily accessible and operates smoothly, with the goal of optimizing the performance of the company’s data ecosystem.
Potential resources should have a strong data management background, with experience handling unstructured and structured data, and the ability to transform and analyze data using various tools or scripting.
If the business case is proven for initial proof of value/proof of concept (POV/POC) engagements, s/he also collaborates with IT Solution Architects to embed the pathfinder value-generating and successful models into operations and help design them as key components in industrialized solutions.
Key Outputs
Contribution to IT Strategy by facilitating exploration through POC/POV — under the supervision and guidance of his/her primary Community of Practice Lead and Product Group Manager based in Switzerland
- Articulates a vision and roadmap for the ingestion, cleansing, staging, harmonization, and exploitation of data as a valued corporate asset, in alignment with existing functional priorities, to help Product Managers explore new ways to solve complex business problems
- Works on data requirements (provided by UX and Data Scientists) that will be used to train and develop models and algorithms to solve business challenges
- As part of a POC/POV, creates data ingestion strategies, prepares data, assists in variable creation, develops information models or data staging strategies, and performs necessary data cleansing activities
- Manages the data lifecycle during the POC/POV and starts creating strategies to embed them into an industrialized model or service operations
- Ensures data is managed in a secure and compliant way, even during the POC/POV, to avoid potential risks
- Works with lead markets, functions, and GMB/RMB to conduct the POC/POV and bring it to closure
Operational Effectiveness and Efficiency by helping industrialize proven models
- Supports product teams and Solution Architects in industrializing information models proven during the POC/POV by devising data collection procedures that include relevant information for building analytic systems
- Assists Solution Architects in developing processes and tools to continuously monitor information model performance and data accuracy
- Helps technical specialists design better descriptive and prescriptive analytics solutions by providing the foundation for semantic models that can be used to visualize information and develop reports on data analysis results to facilitate new KPI/PPI discussions
- Assists technical specialists in API/interfacing technologies to better understand how to acquire data and build ingestion layers for industrialized information models
- Promotes the use of services rather than full automation where manual intervention is more appropriate based on cost-benefit analysis
Stakeholder Engagement
- Influences information architects on what should be part of the company’s core data assets and what has repeat value
- Shares best practices with analytics and product teams and facilitates market enablement for similar initiatives
- Collaborates with stakeholders across the organization to identify opportunities to leverage company data to drive business solutions and provide data sourcing advisory
- Influences product teams, including Solution Architects, through presentations of data-based recommendations for evolving operational solutions with new and enhanced models, including effective semantic models and API connectors
- Champions best practices for data management across delivery and recipient organizations
Key Experiences
- Master’s degree in Computer Science, Engineering, or Management Information Systems
- 5+ years of experience in information modeling and data engineering
- Ability to architect highly scalable distributed systems using open-source tools and big data technologies (such as Hadoop, HBase, Spark, Impala, Storm, etc.) integrated with other open-source or proprietary tools available through the Azure Marketplace, especially Cortana Intelligence components
- Experience in cloud-based agile and DevOps environments with PaaS and IaaS
- Experience using big data batch and streaming tools
- Experience with SQL, NoSQL, relational database design (SAP HANA is a plus), efficient data retrieval methods, and data preparation/wrangling both on demand and in industrialized environments
- Ability to gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, etc.)
- Programming experience in Python, Scala, R, Java, and SQL (PowerShell and C# are an advantage)
- Experience with basic and advanced data visualization: simple displays (e.g., Hue), use of notebooks (e.g., Jupyter, Zeppelin), and building reports and dashboards (e.g., Power BI, SAP BO suite)
- Demonstrated ability to work with minimal supervision
- Strong problem-solving skills with an emphasis on product development
- Effective communication skills across different organizational levels and proficiency in English
- Experience working in a global environment and with virtual teams
Makati, PH, 1224
Makati, PH, 1224