Home > Articles > Business & Management

  • Print
  • + Share This
This chapter is from the book

Exploring Big Data with Prescriptive Analytics

Big Data brings both opportunities and challenges for prescriptive analytics. The volume generally improves the quality and accuracy of management science models. Good management science models can speed up the flow of information by offering quicker decisions and improving operational business intelligence. Variety, though, is seen as a hindrance to the implementation of management science techniques. However, with the right technological framework, the negative impact of variety can be mitigated.

Information technology (IT) is probably the most important ingredient in the Big Data recipe. Over the last decade, changes in IT have brought significant changes in the nature of Big Data. In the past, companies would build database applications and plan server capacity just enough to store a few years’ worth of data. Beyond that, companies would simply delete the transaction data that was older than three to five years. Today, data storage costs are relatively inexpensive, so companies can afford to store data and information generated by these transactions.

Automatic identification and data capture technologies are rapidly developing, and they allow for faster identification of objects, better collection of data about them, and automatic entering of that data directly into computer systems without human involvement. Such technologies include bar codes, radio frequency identification, magnetic stripes, biometrics, optical character recognition, smart cards, cameras, and voice recognition. Finally, laws and regulations are approved for data storage, processing, security, transparency, and auditing (for example, the Sarbanes–Oxley Act of 2002 23), which make it mandatory for organizations to store the information. All these factors have contributed to the rise of the era of Big Data.

Table 1-1. Challenges of Big Data to Optimization Models

Big Data Dimension

Challenges

Technology-Based Solutions

Methodology-Based Solutions

Volume

Managing large and rapidly increasing data sources

Advanced software programs able to process large number of constraints and decision variables

Standardize the ETL processes to automatically capture and process input parameters

Encourage system-driven versus user-driven optimization programs

Variety

Dealing with het-erogeneity of data sources

Dealing with in-complete data sets

Relational database systems and declara-tive query language to retrieve data input for optimization models

ETL toward special-ized optimization-driven data marts

Add data structuring prior to analysis

Implement data cleaning and imputa-tion techniques

Velocity

Managing large and rapidly chang-ing data sets

Reaching on-time optimal solutions for operational business intel-ligence

Advanced optimiza-tion software with the capability to reach optimal solutions within a feasible amount of time

Optimization pack-ages that directly con-nect to operational databases

Consider a trade-off between less-than-optimal but time feasible and practical solution and optimal but complex and often delayed solu-tions

Table 1-1 summarizes challenges of implementing optimization models in the era of Big Data and suggests conceptual approaches for management scientists to deal with them. High volume of Big Data requires that decision scientists have the capability to store and process a large amount of data. Cloud computing technology, which has risen over the past few years, has dramatically increased the ability for the businesses to store and process information. This technology offers dynamic and large distributed platforms for organizations to process input parameters and solve models at a large scale. These platforms can be used to run advanced optimization models, which engage multiple clusters. For example, advanced linear programming (LP) models have recently entered the paradigm of declarative programming. The goal of declarative programming is to ease the programmer’s task by separating the control from the logic of a computation.24 According to this paradigm, the set of LP constraints, for example, is solved by giving a value to each variable so that the solution is consistent with the maximum number of constraints. This allows for engaging multiple clusters when solving large optimization models and as such makes the implementation of large-scale and heavy computational models practically feasible to solve. The use of a declarative programming approach to model and solve mathematical programming models is still at an early stage.

Apache Hadoop is another good example of using advanced technology to handle high variety of data in optimization models. Hadoop, an open source platform, offers distributed computing, which places no condition on the structure of the data it can process. As such, Hadoop can be used as a great platform to mitigate the variety of components of Big Data. Google has introduced MapReduce,25 a programming model approach that can process and filter (map), as well as merge and organize (reduce) large data sets according to specified criteria. MapReduce programs can be automatically and simultaneously executed across several computers, thus saving processing time while handling large amounts of input data.

  • + Share This
  • 🔖 Save To Your Account