Apache Sqoop Cookbook 1st Edition by Kathleen Ting, Jarek Jarcec Cecho – Ebook PDF Instant Download/Delivery: 1449364624, 9781449364625
Full download Apache Sqoop Cookbook 1st Edition after payment
Product details:
ISBN 10: 1449364624
ISBN 13: 9781449364625
Author: Kathleen Ting, Jarek Jarcec Cecho
Integrating data from multiple sources is essential in the age of big data, but it can be a challenging and time-consuming task. This handy cookbook provides dozens of ready-to-use recipes for using Apache Sqoop, the command-line interface application that optimizes data transfers between relational databases and Hadoop. Sqoop is both powerful and bewildering, but with this cookbook’s problem-solution-discussion format, you’ll quickly learn how to deploy and then apply Sqoop in your environment. The authors provide MySQL, Oracle, and PostgreSQL database examples on GitHub that you can easily adapt for SQL Server, Netezza, Teradata, or other relational systems. Transfer data from a single database table into your Hadoop ecosystem Keep table data and Hadoop in sync by importing data incrementally Import data from more than one database table Customize transferred data by calling various database functions Export generated, processed, or backed-up data from Hadoop to your database Run Sqoop within Oozie, Hadoop’s specialized workflow scheduler Load data into Hadoop’s data warehouse (Hive) or database (HBase) Handle installation, connection, and syntax issues common to specific database vendors
Table of contents:
1. Getting Started
Downloading and Installing Sqoop
Problem
Solution
Discussion
Installing JDBC Drivers
Problem
Solution
Discussion
Installing Specialized Connectors
Problem
Solution
Discussion
Starting Sqoop
Problem
Solution
Discussion
Getting Help with Sqoop
Problem
Solution
Discussion
2. Importing Data
Transferring an Entire Table
Problem
Solution
Discussion
Specifying a Target Directory
Problem
Solution
Discussion
Importing Only a Subset of Data
Problem
Solution
Discussion
Protecting Your Password
Problem
Solution
Discussion
Using a File Format Other Than CSV
Problem
Solution
Discussion
Compressing Imported Data
Problem
Solution
Discussion
Speeding Up Transfers
Problem
Solution
Discussion
See Also
Overriding Type Mapping
Problem
Solution
Discussion
Controlling Parallelism
Problem
Solution
Discussion
Encoding NULL Values
Problem
Solution
Discussion
See Also
Importing All Your Tables
Problem
Solution
Discussion
3. Incremental Import
Importing Only New Data
Problem
Solution
Discussion
Incrementally Importing Mutable Data
Problem
Solution
Discussion
Preserving the Last Imported Value
Problem
Solution
Discussion
Storing Passwords in the Metastore
Problem
Solution
Discussion
Overriding the Arguments to a Saved Job
Problem
Solution
Discussion
Sharing the Metastore Between Sqoop Clients
Problem
Solution
Discussion
4. Free-Form Query Import
Importing Data from Two Tables
Problem
Solution
Discussion
Using Custom Boundary Queries
Problem
Solution
Discussion
Renaming Sqoop Job Instances
Problem
Solution
Discussion
Importing Queries with Duplicated Columns
Problem
Solution
Discussion
5. Export
Transferring Data from Hadoop
Problem
Solution
Discussion
Inserting Data in Batches
Problem
Solution
Discussion
Exporting with All-or-Nothing Semantics
Problem
Solution
Discussion
Updating an Existing Data Set
Problem
Solution
Discussion
Updating or Inserting at the Same Time
Problem
Solution
Discussion
See Also
Using Stored Procedures
Problem
Solution
Discussion
Exporting into a Subset of Columns
Problem
Solution
Discussion
Encoding the NULL Value Differently
Problem
Solution
Discussion
See Also
Exporting Corrupted Data
Problem
Solution
Discussion
6. Hadoop Ecosystem Integration
Scheduling Sqoop Jobs with Oozie
Problem
Solution
Discussion
Specifying Commands in Oozie
Problem
Solution
Discussion
Using Property Parameters in Oozie
Problem
Solution
Discussion
Installing JDBC Drivers in Oozie
Problem
Solution
Discussion
See Also
Importing Data Directly into Hive
Problem
Solution
Discussion
See Also
Using Partitioned Hive Tables
Problem
Solution
Discussion
Replacing Special Delimiters During Hive Import
Problem
Solution
Discussion
Using the Correct NULL String in Hive
Problem
Solution
Discussion
See Also
Importing Data into HBase
Problem
Solution
Discussion
Importing All Rows into HBase
Problem
Solution
Discussion
Improving Performance When Importing into HBase
Problem
Solution
Discussion
7. Specialized Connectors
Overriding Imported boolean Values in PostgreSQL Direct Import
Problem
Solution
Discussion
See Also
Importing a Table Stored in Custom Schema in PostgreSQL
Problem
Solution
Discussion
Exporting into PostgreSQL Using pg_bulkload
Problem
Solution
Discussion
See Also
Connecting to MySQL
Problem
Solution
Discussion
Using Direct MySQL Import into Hive
Problem
Solution
Discussion
See Also
Using the upsert Feature When Exporting into MySQL
Problem
Solution
Discussion
See Also
Importing from Oracle
Problem
Solution
Discussion
Using Synonyms in Oracle
Problem
Solution
Discussion
Faster Transfers with Oracle
Problem
Solution
Discussion
See Also
Importing into Avro with OraOop
Problem
Solution
Discussion
Choosing the Proper Connector for Oracle
Problem
Solution
Discussion
Exporting into Teradata
Problem
Solution
Discussion
See Also
Using the Cloudera Teradata Connector
Problem
Solution
Discussion
See Also
Using Long Column Names in Teradata
Problem
Solution
Discussion
People also search:
apache cook book
apache sqoop pdf
dorg.apache.sqoop.splitter.allow_text_splitter
dorg.apache.sqoop.splitter.allow_text_splitter=true
sqoop.apache
Tags: Kathleen Ting, Jarek Jarcec Cecho, Apache, Sqoop