Big Data Testing

Big Data Testing Overview

  • Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product.
  • When it comes to Big data testing, performance and functional testing are the key.
  • In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components.
  • It demands a high level of testing skills, as the processing is very fast. Processing may be of three types.
  • Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness, etc.




Big Data Facts:

According to analyst firm Gartner, "The average organization loses $8.2 million annually through poor Data Quality". And yet, according to the Experian Data Quality report, "99% of organizations have a data quality strategy in place".

This is disturbing in that these Data Quality practices are not finding the bad data that exists in their data. This is a problem that needs to be solved.





Big Data Testing Steps

  • Data Staging Validation
  • Map Reduce validation
  • Output validation phase




Architecture Testing

  • Big Data processes very large volumes of data and is highly resource intensive. Hence, architectural testing is crucial to ensure success of your Big Data project.
  • Poorly or improper designed system may lead to performance degradation, and the system could fail to meet the requirement.
  • Performance and Failover test services should be done in a Big Data environment.
  • Performance testing includes testing of job completion time, memory utilization, data throughput and similar system metrics. While the motive of Failover test service is to verify that data processing occurs seamlessly in case of failure of data nodes










Testing Environment

  • Test Environment needs depend on the type of application you are testing. For Big data testing, test environment should encompass
  • It should have enough space for storage and process large amount of data
  • It should have cluster with distributed nodes and data
  • It should have minimum CPU and memory utilization to keep performance high




Tools


Big Data Cluster

  • NoSQL:
  • Map Reduce:
  • Storage:
  • Servers:
  • Processing

Big Data Tools

  • CouchDB, MongoDB, Cassandra, Redis, ZooKeeper, Hb
  • Hadoop, Hive, Pig, Cascading, Oozie, Kafka, S4, MapR, Flume
  • S3, HDFS ( Hadoop Distributed File System)
  • Elastic, Heroku, Elastic, Google App Engine, EC2
  • R, Yahoo! Pipes, Mechanical Turk, BigSheets, Datameer