1. Trang chủ
  2. » Ngoại Ngữ

6454 pentaho data integration cookbook (2nd ed)

462 184 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 462
Dung lượng 8,57 MB

Nội dung

Pentaho Data Integration Cookbook Second Edition Over 100 recipes for building open source ETL solutions with Pentaho Data Integration Alex Meadows Adrián Sergio Pulvirenti María Carina Roldán BIRMINGHAM - MUMBAI Pentaho Data Integration Cookbook Second Edition Copyright © 2013 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information First published: June 2011 Second Edition: November 2013 Production Reference: 2221113 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78328-067-4 www.packtpub.com Cover Image by Aniket Sawant (aniket_sawant_photography@hotmail.com) Credits Author Alex Meadows Project Coordinator Wendell Palmer Adrián Sergio Pulvirenti María Carina Roldán Reviewers Wesley Seidel Carvalho Daniel Lemire Coty Sutherland Acquisition Editor Usha Iyer Meeta Rajani Lead Technical Editor Arvind Koul Technical Editors Dennis John Adrian Raposo Gaurav Thingalaya Proofreader Kevin McGowan Indexer Monica Ajmera Mehta Graphics Ronak Dhruv Production Coordinator Nilesh R Mohite Cover Work Nilesh R Mohite About the Author Alex Meadows has worked with open source Business Intelligence solutions for nearly 10 years and has worked in various industries such as plastics manufacturing, social and e-mail marketing, and most recently with software at Red Hat, Inc He has been very active in Pentaho and other open source communities to learn, share, and help newcomers with the best practices in BI, analytics, and data management He received his Bachelor's degree in Business Administration from Chowan University in Murfreesboro, North Carolina, and his Master's degree in Business Intelligence from St Joseph's University in Philadelphia, Pennsylvania First and foremost, thank you Christina for being there for me before, during, and after taking on the challenge of writing and revising a book I know it's not been easy, but thank you for allowing me the opportunity To my grandmother, thank you for teaching me at a young age to always go for goals that may just be out of reach Finally, this book would be no where without the Pentaho community and the friends I've made over the years being a part of it Adrián Sergio Pulvirenti was born in Buenos Aires, Argentina, in 1972 He earned his Bachelor's degree in Computer Sciences at UBA, one of the most prestigious universities in South America He has dedicated more than 15 years to developing desktop and web-based software solutions Over the last few years he has been leading integration projects and development of BI solutions I'd like to thank my lovely kids, Camila and Nicolas, who understood that I couldn't share with them the usual video game sessions during the writing process I'd also like to thank my wife, who introduced me to the Pentaho world María Carina Roldán was born in Esquel, Argentina, in 1970 She earned her Bachelor's degree in Computer Science at UNLP in La Plata; after that she did a postgraduate course in Statistics at the University of Buenos Aires (UBA) in Buenos Aires city, where she has been living since 1994 She has worked as a BI consultant for more than 10 years Over the last four years, she has been dedicated full time to developing BI solutions using Pentaho Suite Currently, she works for Webdetails, one of the main Pentaho contributors She is the author of Pentaho 3.2 Data Integration: Beginner's Guide published by Packt Publishing in April 2010 You can follow her on Twitter at @mariacroldan I'd like to thank those who have encouraged me to write this book: On one hand, the Pentaho community; they have given me a rewarding feedback after the Beginner's book On the other side, my husband, who without hesitation, agreed to write the book with me Without them I'm not sure I would have embarked on a new book project I'd also like to thank the technical reviewers for the time and dedication that they have put in reviewing the book In particular, thanks to my colleagues at Webdetails; it's a pleasure and a privilege to work with them every day About the Reviewers Wesley Seidel Carvalho got his Master's degree in Computer Science from the Institute of Mathematics and Statistics, University of São Paulo (IME-USP), Brazil, where he researched on (his dissertation) Natural Language Processing (NLP) for the Portuguese language He is a Database Specialist from the Federal University of Pará (UFPa) He has a degree in Mathematics from the State University of Pará (Uepa) Since 2010, he has been working with Pentaho and researching Open Data government He is an active member of the communities and lists of Free Software, Open Data, and Pentaho in Brazil, contributing software "Grammar Checker for OpenOffice - CoGrOO" and CoGrOO Community He has worked with technology, database, and systems development since 1997, Business Intelligence since 2003, and has been involved with Pentaho and NLP since 2009 He is currently serving its customers through its startups: ff http://intelidados.com.br ff http://ltasks.com.br Daniel Lemire has a B.Sc and a M.Sc in Mathematics from the University of Toronto, and a Ph.D in Engineering Mathematics from the Ecole Polytechnique and the Université de Montréal He is a Computer Science professor at TELUQ (Université du Québec) where he teaches Primarily Online He has also been a research officer at the National Research Council of Canada and an entrepreneur He has written over 45 peer-reviewed publications, including more than 25 journal articles He has held competitive research grants for the last 15 years He has served as a program committee member on leading computer science conferences (for example, ACM CIKM, ACM WSDM, and ACM RecSys) His open source software has been used by major corporations such as Google and Facebook His research interests include databases, information retrieval, and high performance programming He blogs regularly on computer science at http://lemire.me/blog/ Coty Sutherland was first introduced to computing around the age of 10 At that time, he was immersed in various aspects of computers and it became apparent that he had a propensity for software manipulation From then until now, he has stayed involved in learning new things in the software space and adapting to the changing environment that is Software Development He graduated from Appalachian State University in 2009 with a Bachelor's Degree in Computer Science After graduation, he focused mainly on software application development and support, but recently transitioned to the Business Intelligence field to pursue new and exciting things with data He is currently employed by the open source company, Red Hat, as a Business Intelligence Engineer www.PacktPub.com Support files, eBooks, discount offers and more You might want to visit www.PacktPub.com for support files and downloads related to your book Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks TM http://PacktLib.PacktPub.com Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library Here, you can access, read and search across Packt's entire library of books.  Why Subscribe? ff Fully searchable across every book published by Packt ff Copy and paste, print and bookmark content ff On demand and accessible via web browser Free Access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books Simply use your login credentials for immediate access Table of Contents Preface 1 Chapter 1: Working with Databases Introduction 7 Connecting to a database Getting data from a database 14 Getting data from a database by providing parameters 16 Getting data from a database by running a query built at runtime 21 Inserting or updating rows in a table 23 Inserting new rows where a simple primary key has to be generated 28 Inserting new rows where the primary key has to be generated based on stored values 32 Deleting data from a table 35 Creating or altering a database table from PDI (design time) 40 Creating or altering a database table from PDI (runtime) 43 Inserting, deleting, or updating a table depending on a field 45 Changing the database connection at runtime 51 Loading a parent-child table 53 Building SQL queries via database metadata 57 Performing repetitive database design tasks from PDI 62 Chapter 2: Reading and Writing Files 65 Introduction 66 Reading a simple file 66 Reading several files at the same time 70 Reading semi-structured files 72 Reading files having one field per row 79 Reading files with some fields occupying two or more rows 82 Writing a simple file 84 Writing a semi-structured file 87 ... 1: Working with Databases Introduction 7 Connecting to a database Getting data from a database 14 Getting data from a database by providing parameters 16 Getting data from a database by running... with Databases In this chapter, we will cover: ff Connecting to a database ff Getting data from a database ff Getting data from a database by providing parameters ff Getting data from a database... Loading data into Salesforce.com 112 Getting data from Salesforce.com 114 Loading data into Hadoop 115 Getting data from Hadoop 119 Loading data into HBase 122 Getting data from HBase 127 Loading data

Ngày đăng: 05/10/2018, 12:50

TỪ KHÓA LIÊN QUAN