IBM Big Data Engineer practice Tests Certification 2021
Description
About IBM Big Data Engineer
This certification is intended for IBM Big Data Engineers. The Big Data Engineer works directly with the Data Architect and hands-on Developers to convert the architect’s Big Data vision and blueprint into a Big Data reality. The Data Engineer possesses a deep level of technical knowledge and experience across a wide array of products and technologies.
Prerequisite for the exam
• Understand the data layer and particular areas of potential challenge/risk in the data layer
• Ability to translate functional requirements into technical specifications.
• Ability to take overall solution/logical architecture and provide physical architecture.
• Understand Cluster Management
• Understand Network Requirements
• Understand Important interfaces
• Understand Data Modeling
• Ability to identify/support non-functional requirements for the solution
• Understand Latency
• Understand Scalability
• Understand High Availability
• Understand Data Replication and Synchronization
• Understand Disaster Recovery
• Understand Overall performance (Query Performance, Workload Management, Database Tuning)
• Propose recommended and/or best practices regarding the movement, manipulation, and storage of data in a big data solution (including, but not limited to:
• Understand Data ingestion technical options
• Understand Data storage options and ramifications (for example , understand the additional requirements and challenges introduced by data in the cloud)
• Understand Data querying techniques & availability to support analytics
• Understand Data lineage and data governance
• Understand Data variety (social, machine data) and data volume
• Understand/Implement and provide guidance around data security to support implementation, including but not limited to:
- Understand LDAP Security
- Understand User Roles/Security
- Understand Data Monitoring
- Understand Personally Identifiable Information (PII) Data Security considerations
Course Outline
1. Data Loading
• Load unstructured data into InfoSphere BigInsights
• Import streaming data into Hadoop using InfoSphere Streams
• Create a BigSheets workbook
• Import data into Hadoop and create Big SQL table definitions
• Import data to HBase
• Import data to Hive
• Use Data Click to load from relational sources into InfoSphere BigInsights with a self-service process
• Extract data from a relational source using Sqoop
• Load log data into Hadoop using Flume
• Insert data via IBM General Parallel File System (GPFS) Posix file system API
• Load data with Hadoop command line utility
2. Data Security
• Keep data secure within PCI standards
• Uses masking (e.g. Optim, Big SQL), and redaction to protect sensitive data
3. Architecture and Integration
• Implement MapReduce
• Evaluate use cases for selecting Hive, Big SQL, or HBase
• Create and/or query a Solr index
• Evaluate use cases for selecting potential file formats (e.g. JSON, CSV, Parquet, Sequence, etc..)
• Utilize Apache Hue for search visualization
4. Performance and Scalability
• Use Resilient Distributed Dataset (RDD) to improve MapReduce performance
• Choose file formats to optimize performance of Big SQL, JAQL, etc.
• Make specific performance tuning decisions for Hive and HBase
• Analyze performance considerations when using Apache Spark
5. Data Preparation, Transformation, and Export
• Use Jaql query methods to transform data in InfoSphere BigInsights
• Capture and prep social data for analytics
• Integrating SPSS model scoring in InfoSphere Streams
• Implement entity resolution within a Big Data platform (e.g. Big Match)
• Utilize Pig for data transformation and data manipulation
• Use Big SQL to transform data in InfoSphere BigInsights
• Export processing results out of Hadoop (e.g. DataClick, DataStage, etc.)
• Utilize consistent regions in InfoSphere Streams to ensure at least once processing
Who this course is for:
- All Levels