Thuyết trình big data
Trang 1Big Data
NHÓM 1 GVGD: TS Nguyễn Đức Thái
Trang 2Memory storage…
Computer Memory: 640K Ought to
be Enough for Anyone
Trang 3How much data?
7 billion people
Google processes 100 PB/day; 3 million servers
Facebook has 300 PB + 500 TB/day; 35% of
world’s photos
YouTube 1000 PB video storage; 4 billion views/ day
Twitter processes 124 billion tweets/year
SMS messages – 6.1T per year
US Cell Calls – 2.2T minutes per year
US Credit cards - 1.4B Cards; 20B transactions/ year
3
Trang 44 Big Data Security
3 SQL vs NoSQL
2 Big Data Technology Today
1 Big Data Overview
5 Big data trends
6 Demo with MongoDB & Ref docs
Trang 51 Big Data Overview (tt)
“ Big data is not a single technology but a combination of old and new
tech-nologies that helps companies gain actionable insight ”
(“Big Data For DummiesPublished by John Wiley & Sons, Inc ” book reference )
Trang 61 Big Data Overview (tt)
Trang 7Characteristics of Big Data
Trang 8Sources of Big Data
Trang 9Examining Big Data Types
Structured Data
Trang 10Structured Data(…)
Computer- or machine-generated:
Machine-generated data generally
refers to data that is created by a
machine without human intervention.(Sensor data, Web log data, Point-of-sale data, Financial data…)
Human-generated: This is data that humans, in interaction with
computers, supply (Input data, stream data, Gaming-related data…)
Trang 11Click-Examining Big Data Types
Unstructured Data
Trang 12Unstructured Data(…)
Unstructured data is everywhere
Machine-generated unstructured
data: Satellite images, Scientific
data, Photographs and video, Radar
Trang 13Managing different data types
Trang 14Managing different data types
Integrating data types into a big data environment need:
Connectors: enable you to pull data
in from various big data sources
Metadata is the definitions,
mappings, and other characteristics used to describe how to find, access, and use a company’s data (and
software) components
Trang 15What will we do with Big Data?
Trang 16How to store and handle Big Data?
Trang 172 Big Data Technology Today
Storage…NoSQL Database
Trang 182.Big Data Technology Today(tt)
Processing
Trang 192.Big Data Technology Today(tt)
The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of computers using simple programming models.
Trang 202.Big Data Technology Today(tt)
Instead of treating
memory as a cache,
why not treat it as a
primary data store?
Facebook keeps 80% of its
data in Memory (Stanford
Data Grid
Trang 212.Big Data Technology Today(tt)
Transfer data:
Trang 222.Big Data Technology Today(tt)
Open-source software framework from Apache Hadoop
Google MapReduce
GFS (Google File System)
HDFS
Map/Reduce
Trang 233 SQL vs NoSQL
Data storage
File
SQL DBMS
NoSQL
Trang 243 SQL vs NoSQL (…)
A relational database is a set of tables containing data fitted into predefined categories.
Each table contains one or more data categories in columns.
Each row contains a unique instance of data for the categories defined by the columns.
Trang 253 SQL vs NoSQL (…)
key-value store is a system that stores
values indexed for retrieval by keys.
Some of the market leaders:
Riak Amazon Dynamo Voldermort
Trang 263 SQL vs NoSQL (…)
Column-oriented databases
column-oriented databases contain one extendable
column of closely related data
Some of the market leaders:
HBase Cassandra
Trang 273 SQL vs NoSQL (…)
Document-based stores These databases
store and organize data as collections of
documents, rather than as structured tables
with uniform sized fields for each record
Some of the market
leaders:
MongoDB CouchDB SimpleDB
Trang 283 SQL vs NoSQL (…)
SQL 2008 Data storage capacity
Trang 29 files stores the file’s metadata For
details, see The files Collection
Trang 303 SQL vs NoSQL (…)
BSON Types The chunks Collection
The files Collection
Trang 313 SQL vs NoSQL (…)
Trang 324 Big Data Security
• Secure computations in distributed
programming frameworks
• Security best practices for non-relational data stores
• Secure data storage and transactions logs
• Cryptographically enforced access control and secure communication
• Granular access control
• Real-time security/compliance monitoring
Trang 334 Big Data Security (…)
Technical Recommendations for sercurity
• Use Kerberos for node authentication
• Use file layer encryption
Trang 345 Big data trends
• Big data – of the people, by the people, for the people
• Big data and social computing
• Cloud computing
• Mobile Applications and HTML5
• Internet and big data
Trang 356 Demo with MongoDB & Ref docs
Ref docs:
Judith Hurwitz, Alan Nugent, Dr Fern Halper, and Marcia Kaufman: Big Data For Dummies John Wiley & Sons, Inc 2013
“Technology Trends for 2013” prepared
by Kaushal Amin, Chief Technology Officer, KMS Technology – Atlanta, GA, USA
Website: http://hadoop.apache.org/
Demo with MongoDB
Trang 36Thank You !