Home > Articles > Data


RSS Feed Subscribe to this topic  RSS details

492 Items

Sort by Date | Title

Video: Hadoop Fundamentals: Install Hortonworks HDP2.1 Sandbox
By Doug Eadline
Dec 17, 2014
Are you ready to get up and running with Hadoop? One of the first things you'll want to do is install Hortonworks HDP 2.1 Sandbox. Doug Eadline shows you how in this excerpt from Hadoop Fundamentals LiveLessons (Video Training), 2nd Edition.
NoSQL Databases: An Overview
By Pramod J. Sadalage
Dec 11, 2014
Pramod Sadalage provides an overview of NoSQL databases, explaining what NoSQL is, types of NoSQL databases, and why and how to choose a NoSQL database.
HBase Data Analysis with MapReduce
By Steven Haines
Nov 26, 2014
HBase supports two types of read access: table scans by row key and MapReduce jobs. Table scans enable you to retrieve the exact subset of rows you are looking for, and MapReduce jobs enable you to perform analysis across a greater set of data. This article reviews HBase’s support for MapReduce and demonstrates how to create a MapReduce job to analyze 10,000 records in a table.
Big Data Scalability: Why Your Database Is Slow, and When You Should Start Scaling
By Cory Isaacson
Nov 12, 2014
Constant increases in the volume of Big Data worldwide have begun to overwhelm the database management systems on which we all rely. We need a comprehensive method for managing this overflow. Database expert Cory Isaacson, CEO/CTO of CodeFutures and author of Understanding Big Data Scalability, discusses how scaling can help to keep your databases from being overwhelmed, how you'll know when you should start scaling, and the best way to make it happen.
Introduction to HBase, the NoSQL Database for Hadoop: Programming HBase with Java
By Steven Haines
Nov 4, 2014
The first article in this series (“Introduction to HBase”) presented HBase, also known as the Hadoop database and described how to set up a local environment and manipulate data using the HBase shell. This article continues by demonstrating how to interact with HBase using Java. You learn how to put data into HBase, get data out of HBase, delete data from HBase, and how to perform a table scan to extract a range of records. Finally, you see how to set up an HBase project using Maven.
Introduction to HBase, the NoSQL Database for Hadoop
By Steven Haines
Oct 27, 2014
HBase is called the Hadoop database because it is a NoSQL database that runs on top of Hadoop. It combines the scalability of Hadoop by running on the Hadoop Distributed File System (HDFS), with real-time data access as a key/value store and deep analytic capabilities of Map Reduce. This article introduces HBase and describes how it organizes and manages data and then demonstrates how to set up a local HBase environment and interact with data using the HBase shell.
Introduction to Oracle Databases on Virtual Infrastructure
By Kannan Mani, Don Sullivan
Oct 22, 2014
99.9% of all database or data management systems should be considered candidates for virtualization on vSphere. In this chapter from Virtualizing Oracle Databases on vSphere, the authors argue that Oracle databases and software are prime candidates to consider migrating to virtualized infrastructure.
Introducing NoSQL and MongoDB
By Brad Dayley
Sep 18, 2014
In this chapter from NoSQL with MongoDB in 24 Hours, Sams Teach Yourself, learn about the design considerations to review before deciding how to implement the structure of data and configuration of a MongoDB database. You'll also learn which design questions to ask and then how to explore the mechanisms built into MongoDB to answer those questions.
Hit the Ground Running with MongoDB and Python
By Stephen B. Morris
Sep 16, 2014
Stephen B. Morris describes how to get started with MongoDB and Python. As usual with Python, you can get productive quickly, without worrying about complex IDEs. MongoDB has a simple data model and easy-to-understand semantics, giving you a handy on-ramp to this interesting technology.
Introduction to Understanding Big Data Scalability
By Cory Isaacson
Aug 18, 2014
This introduction describes the goals of the Big Data Scalability four-volume series, focusing on the underlying growth of databases (the "data explosion") and providing some background into big data's relevance.
SQL Queries for Mere Mortals: Thinking in Sets
By Michael J. Hernandez, John Viescas
Jul 1, 2014
This chapter introduces the concept of an SQL set. It discusses each of the major set operations implemented in SQL in detail (intersection, difference, and union), and shows how to use set diagrams to visualize the problem you’re trying to solve. Finally, it introduces the basic SQL syntax and keywords (INTERSECT, EXCEPT, and UNION) for all three operations.
Why Big Data and Analytics?
By Brenda L. Dietrich, Maureen F. Norton, Emily C. Plachy
Jun 12, 2014
Get the inside story of how analytics is being used across the IBM enterprise in this introduction to Analytics Across the Enterprise: How IBM Realizes Business Value from Big Data and Analytics.
Ten Tips to Realize Value from Big Data and Analytics
By Brenda L. Dietrich, Maureen F. Norton, Emily C. Plachy
Jun 10, 2014
What does it really take to derive value from Big Data and Analytics? Co-authors of Analytics Across the Enterprise: How IBM Realizes Business Value from Big Data and Analytics, Brenda Dietrich, Emily Plachy and Maureen Norton, identify 10 top tips based on their years of experience at IBM “eating their own cooking.” Interviews with more than 70 executives, managers and analytic practitioners across IBM yielded 31 case studies across 9 different business functions which show the breadth challenges, outcomes, analytics techniques, and lessons learned to make your analytics journey to realize business value successful.
Preface to Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
By Arun C. Murthy, Doug Eadline, Jeff Markham, Joseph Niemiec, Vinod Kumar Vavilapalli
May 20, 2014
While the power of YARN is easily comprehensible, the ability to exploit that power requires the user to understand the intricacies of building such a system in conjunction with YARN. This book aims to reconcile that dichotomy, as the authors explain in the preface to Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2.
Apache Hadoop YARN: A Brief History and Rationale
By Arun C. Murthy, Doug Eadline, Jeff Markham, Joseph Niemiec, Vinod Kumar Vavilapalli
Mar 24, 2014
This chapter provides a historical account of why and how Apache Hadoop YARN came about.
Data Just Right Video Tutorials: How to Use Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery
By Michael Manoochehri
Feb 26, 2014
Michael Manoochehri provides viewers with an introduction to implementing practical solutions for common data problems. This excerpt from Data Just Right LiveLessonscontains three sample videos: 1. Loading Data into Hive, 2. Writing a Multistep MapReduce Job Using the mrjob Python Library, and 3. Using the Pandas Library for Analyzing Time Series Data.
Data Reshaping in R
By Jared P. Lander
Feb 3, 2014
Jared P. Lander considers when the data needs to be rearranged from column oriented to row oriented (or the opposite) and when the data are in multiple, separate sets and need to be combined into one.
The Basics of Monitoring Cassandra
By Russell Bradberry, Eric Lubow
Jan 28, 2014
This chapter covers the basics of monitoring Cassandra. These include file-based logging, inspection of the JVM, and monitoring of Cassandra itself.
Four Rules for Data Success
By Michael Manoochehri
Jan 1, 2014
Database technology is a fast-moving field filled with innovations. Michael Manoochehri describes the current state of the field and discusses the four rules for data success.
What's New in SQL Server 2012
By Ray Rankins, Paul T. Bertucci, Chris Gallelli, Alex T. Silverstein
Dec 13, 2013
This chapter introduces the major new features provided in SQL Server 2012 and covers a number of the enhancements to previously available features.

Page 1 2 3 4 5 Next >