Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition by Norman Matloff – Ebook PDF Instant Download/Delivery: 0367738198, 9780367738198

Full download Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition after payment

Product details:

ISBN 10: 0367738198

ISBN 13: 9780367738198

Author: Norman Matloff

This is one of the first parallel computing books to focus exclusively on parallel data structures, algorithms, software tools, and applications in data science. The book prepares readers to write effective parallel code in various languages and learn more about different R packages and other tools. It covers the classic n observations, p variables matrix format and common data structures. Many examples illustrate the range of issues encountered in parallel programming.

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Table of contents:

1 Introduction to Parallel Processing in R

1.1 Recurring Theme: The Principle of Pretty Good Parallelism

1.1.1 Fast Enough

1.1.2 “R+X”

1.2 A Note on Machines

1.3 Recurring Theme: Hedging One’s Bets

1.4 Extended Example: Mutual Web Outlinks

1.4.1 Serial Code

1.4.2 Choice of Parallel Tool

1.4.3 Meaning of “snow” in This Book

1.4.4 Introduction to snow

1.4.5 Mutual Outlinks Problem, Solution 1

1.4.5.1 Code

1.4.5.2 Timings

1.4.5.3 Analysis of the Code

1.5 Further Reading

2 “Why Is My Program So Slow?”: Obstacles to Speed

2.1 Obstacles to Speed

2.2 Performance and Hardware Structures

2.3 Memory Basics

2.3.1 Caches

2.3.2 Virtual Memory

2.3.3 Monitoring Cache Misses and Page Faults

2.3.4 Locality of Reference

2.4 Network Basics

2.5 Latency and Bandwidth

2.5.1 Two Representative Hardware Platforms: Multicore Machines and Clusters

2.5.1.1 Multicore

2.5.1.2 Clusters

2.5.2 The Principle of “Just Leave It There”

2.6 Thread Scheduling

2.7 How Many Processes/Threads?

2.8 Example: Mutual Outlink Problem

2.9 “Big O” Notation

2.10 Data Serialization

2.11 “Embarrassingly Parallel” Applications

2.11.1 What People Mean by “Embarrassingly Parallel”

2.11.2 Suitable Platforms for Non-Embarrassingly Parallel Applications

2.12 Further Reading

3 Principles of Parallel Loop Scheduling

3.1 General Notions of Loop Scheduling

3.2 Chunking in snow

3.2.1 Example: Mutual Outlinks Problem

3.3 A Note on Code Complexity

3.4 Example: All Possible Regressions

3.4.1 Parallelization Strategies

3.4.2 The Code

3.4.3 Sample Run

3.4.4 Code Analysis

3.4.4.1 Our Task List

3.4.4.2 Chunking

3.4.4.3 Task Scheduling

3.4.4.4 The Actual Dispatching of Work

3.4.4.5 Wrapping Up

3.4.5 Timing Experiments

3.5 The partools Package

3.6 Example: All Possible Regressions, Improved Version

3.6.1 Code

3.6.2 Code Analysis

3.6.3 Timings

3.7 Introducing Another Tool: multicore

3.7.1 Source of the Performance Advantage

3.7.2 Example: All Possible Regressions, Using multicore

3.8 Issues with Chunk Size

3.9 Example: Parallel Distance Computation

3.9.1 The Code

3.9.2 Timings

3.10 The foreach Package

3.10.1 Example: Mutual Outlinks Problem

3.10.2 A Caution When Using foreach

3.11 Stride

3.12 Another Scheduling Approach: Random Task Permutation

3.12.1 The Math

3.12.2 The Random Method vs. Others, in Practice

3.13 Debugging snow and multicore Code

3.13.1 Debugging in snow

3.13.2 Debugging in multicore

4 The Shared-Memory Paradigm: A Gentle Introduction via R

4.1 So, What Is Actually Shared?

4.1.1 Global Variables

4.1.2 Local Variables: Stack Structures

4.1.3 Non-Shared Memory Systems

4.2 Clarity of Shared-Memory Code

4.3 High-Level Introduction to Shared-Memory Programming: Rdsm Package

4.3.1 Use of Shared Memory

4.4 Example: Matrix Multiplication

4.4.1 The Code

4.4.2 Analysis

4.4.3 The Code

4.4.4 A Closer Look at the Shared Nature of Our Data

4.4.5 Timing Comparison

4.4.6 Leveraging R

4.5 Shared Memory Can Bring A Performance Advantage

4.6 Locks and Barriers

4.6.1 Race Conditions and Critical Sections

4.6.2 Locks

4.6.3 Barriers

4.7 Example: Maximal Burst in a Time Series

4.7.1 The Code

4.8 Example: Transforming an Adjacency Matrix

4.8.1 The Code

4.8.2 Overallocation of Memory

4.8.3 Timing Experiment

4.9 Example: k-Means Clustering

4.9.1 The Code

4.9.2 Timing Experiment

4.10 Further Reading

5 The Shared-Memory Paradigm: C Level

5.1 OpenMP

5.2 Example: Finding the Maximal Burst in a Time Series

5.2.1 The Code

5.2.2 Compiling and Running

5.2.3 Analysis

5.2.4 A Cautionary Note About Thread Scheduling

5.2.5 Setting the Number of Threads

5.2.6 Timings

5.3 OpenMP Loop Scheduling Options

5.3.1 OpenMP Scheduling Options

5.3.2 Scheduling through Work Stealing

5.4 Example: Transforming an Adjacency Matrix

5.4.1 The Code

5.4.2 Analysis of the Code

5.5 Example: Adjacency Matrix, R-Callable Code

5.5.1 The Code, for C()

5.5.2 Compiling and Running

5.5.3 Analysis

5.5.4 The Code, for Rcpp

5.5.5 Compiling and Running

5.5.6 Code Analysis

5.5.7 Advanced Rcpp

5.6 Speedup in C

5.7 Run Time vs. Development Time

5.8 Further Cache/Virtual Memory Issues

5.9 Reduction Operations in OpenMP

5.9.1 Example: Mutual In-Links

5.9.1.1 The Code

5.9.1.2 Sample Run

5.9.1.3 Analysis

5.9.2 Cache Issues

5.9.3 Rows vs. Columns

5.9.4 Processor Affinity

5.10 Debugging

5.10.1 Threads Commands in GDB

5.10.2 Using GDB on C/C++ Code Called from R

5.11 Intel Thread Building Blocks (TBB)

5.12 Lockfree Synchronization

5.13 Further Reading

6 The Shared-Memory Paradigm: GPUs

6.1 Overview

6.2 Another Note on Code Complexity

6.3 Goal of This Chapter

6.4 Introduction to NVIDIA GPUs and CUDA

6.4.1 Example: Calculate Row Sums

6.4.2 NVIDIA GPU Hardware Structure

6.4.2.1 Cores

6.4.2.2 Threads

6.4.2.3 The Problem of Thread Divergence

6.4.2.4 “OS in Hardware”

6.4.2.5 Grid Configuration Choices

6.4.2.6 Latency Hiding in GPUs

6.4.2.7 Shared Memory

6.4.2.8 More Hardware Details

6.4.2.9 Resource Limitations

6.5 Example: Mutual Inlinks Problem

6.5.1 The Code

6.5.2 Timing Experiments

6.6 Synchronization on GPUs

6.6.1 Data in Global Memory Is Persistent

6.7 R and GPUs

6.7.1 Example: Parallel Distance Computation

6.8 The Intel Xeon Phi Chip

6.9 Further Reading

7 Thrust and Rth

7.1 Hedging One’s Bets

7.2 Thrust Overview

7.3 Rth

7.4 Skipping the C++

7.5 Example: Finding Quantiles

7.5.1 The Code

7.5.2 Compilation and Timings

7.5.3 Code Analysis

7.6 Introduction to Rth

8 The Message Passing Paradigm

8.1 Message Passing Overview

8.2 The Cluster Model

8.3 Performance Issues

8.4 Rmpi

8.4.1 Installation and Execution

8.5 Example: Pipelined Method for Finding Primes

8.5.1 Algorithm

8.5.2 The Code

8.5.3 Timing Example

8.5.4 Latency, Bandwdith and Parallelism

8.5.5 Possible Improvements

8.5.6 Analysis of the Code

8.6 Memory Allocation Issues

8.7 Message-Passing Performance Subtleties

8.7.1 Blocking vs. Nonblocking I/O

8.7.2 The Dreaded Deadlock Problem

8.8 Further Reading

9 MapReduce Computation

9.1 Apache Hadoop

9.1.1 Hadoop Streaming

9.1.2 Example: Word Count

9.1.3 Running the Code

9.1.4 Analysis of the Code

9.1.5 Role of Disk Files

9.2 Other MapReduce Systems

9.3 R Interfaces to MapReduce Systems

9.4 An Alternative: “Snowdoop”

9.4.1 Example: Snowdoop Word Count

9.4.2 Example: Snowdoop k-Means Clustering

9.5 Further Reading

10 Parallel Sorting and Merging

10.1 The Elusive Goal of Optimality

10.2 Sorting Algorithms

10.2.1 Compare-and-Exchange Operations

10.2.2 Some “Representative” Sorting Algorithms

10.3 Example: Bucket Sort in R

10.4 Example: Quicksort in OpenMP

10.5 Sorting in Rth

10.6 Some Timing Comparisons

10.7 Sorting on Distributed Data

10.7.1 Hyperquicksort

10.8 Further Reading

11 Parallel Prefix Scan

11.1 General Formulation

11.2 Applications

11.3 General Strategies

11.3.1 A Log-Based Method

11.3.2 Another Way

11.4 Implementations of Parallel Prefix Scan

11.5 Parallel cumsum() with OpenMP

11.5.1 Stack Size Limitations

11.5.2 Let’s Try It Out

11.6 Example: Moving Average

11.6.1 Rth Code

11.6.2 Algorithm

11.6.3 Performance

11.6.4 Use of Lambda Functions

12 Parallel Matrix Operations

12.1 Tiled Matrices

12.2 Example: Snowdoop Approach

12.3 Parallel Matrix Multiplication

12.3.1 Multiplication on Message-Passing Systems

12.3.1.1 Distributed Storage

12.3.1.2 Fox’s Algorithm

12.3.1.3 Overhead Issues

12.3.2 Multiplication on Multicore Machines

12.3.2.1 Overhead Issues

12.3.3 Matrix Multiplication on GPUs

12.3.3.1 Overhead Issues

12.4 BLAS Libraries

12.4.1 Overview

12.5 Example: Performance of OpenBLAS

12.6 Example: Graph Connectedness

12.6.1 Analysis

12.6.2 The “Log Trick”

12.6.3 Parallel Computation

12.6.4 The matpow Package

12.6.4.1 Features

12.7 Solving Systems of Linear Equations

12.7.1 The Classical Approach: Gaussian Elimination and the LU Decomposition

12.7.2 The Jacobi Algorithm

12.7.2.1 Parallelization

12.7.3 Example: R/gputools Implementation of Jacobi

12.7.4 QR Decomposition

12.7.5 Some Timing Results

12.8 Sparse Matrices

12.9 Further Reading

13 Inherently Statistical Approaches: Subset Methods

13.1 Chunk Averaging

13.1.1 Asymptotic Equivalence

13.1.2 O(·) Analysis

13.1.3 Code

13.1.4 Timing Experiments

13.1.4.1 Example: Quantile Regression

13.1.4.2 Example: Logistic Model

13.1.4.3 Example: Estimating Hazard Functions

13.1.5 Non-i.i.d. Settings

13.2 Bag of Little Bootstraps

13.3 Subsetting Variables

13.4 Further Reading

A Review of Matrix Algebra

A.1 Terminology and Notation

A.1.1 Matrix Addition and Multiplication

A.2 Matrix Transpose

A.3 Linear Independence

A.4 Determinants

A.5 Matrix Inverse

A.6 Eigenvalues and Eigenvectors

A.7 Matrix Algebra in R

B R Quick Start

B.1 Correspondences

B.2 Starting R

B.3 First Sample Programming Session

B.4 Second Sample Programming Session

B.5 Third Sample Programming Session

B.6 The R List Type

B.6.1 The Basics

B.6.2 The Reduce() Function

B.6.3 S3 Classes

B.6.4 Handy Utilities

B.7 Debugging in R

C Introduction to C for R Programmers

C.0.1 Sample Program

C.0.2 Analysis

C.1 C++

People also search for Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd:

what is computational data science

is parallel computing useful

computational data science jobs

parallel computing for data science

Tags: Norman Matloff, Computing, Parallel, Examples

Sign up for Newsletter

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition by Norman Matloff 0367738198 9780367738198

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition by Norman Matloff – Ebook PDF Instant Download/Delivery: 0367738198, 9780367738198

Full download Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition after payment

Product details:

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Table of contents:

People also search for Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd:

Sign up for Newsletter

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition by Norman Matloff 0367738198 9780367738198

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition by Norman Matloff – Ebook PDF Instant Download/Delivery: 0367738198, 9780367738198

Full download Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Edition after payment

Product details:

Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd Table of contents:

People also search for Parallel Computing for Data Science With Examples in R C++ and CUDA 2nd:

Login