1. Trang chủ
  2. » Công Nghệ Thông Tin

IT training big data now 2016 edition khotailieu

153 49 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 153
Dung lượng 41,61 MB

Nội dung

Big Data Now: 2016 Edition Current Perspectives from O’Reilly Media O’Reilly Media, Inc Beijing Boston Farnham Sebastopol Tokyo Big Data Now: 2016 Edition by O’Reilly Media, Inc Copyright © 2017 O’Reilly Media, Inc All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://oreilly.com/safari) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Nicole Tache Production Editor: Nicholas Adams Copyeditor: Gillian McGarvey February 2017: Proofreader: Amanda Kersey Interior Designer: David Futato Cover Designer: Randy Comer First Edition Revision History for the First Edition 2017-01-27: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc Big Data Now: 2016 Edition, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is sub‐ ject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-97748-4 [LSI] Table of Contents Introduction vii Careers in Data Five Secrets for Writing the Perfect Data Science Resume There’s Nothing Magical About Learning Data Science Data Scientists: Generalists or Specialists? Tools and Architecture for Big Data 11 Apache Cassandra for Analytics: A Performance and Storage Analysis Scalable Data Science with R Data Science Gophers Applying the Kappa Architecture to the Telco Industry 11 23 27 33 Intelligent Real-Time Applications 41 The World Beyond Batch Streaming Extend Structured Streaming for Spark ML Semi-Supervised, Unsupervised, and Adaptive Algorithms for Large-Scale Time Series Related Resources: Uber’s Case for Incremental Processing on Hadoop 41 51 54 56 56 Cloud Infrastructure 67 Where Should You Manage a Cloud-Based Hadoop Cluster? Spark Comparison: AWS Versus GCP Time-Series Analysis on Cloud Infrastructure Metrics 67 70 75 v Machine Learning: Models and Training 83 What Is Hardcore Data Science—in Practice? Training and Serving NLP Models Using Spark MLlib Three Ideas to Add to Your Data Science Toolkit Related Resources Introduction to Local Interpretable Model-Agnostic Explanations (LIME) 83 95 107 111 111 Deep Learning and AI 117 The Current State of Machine Intelligence 3.0 Hello, TensorFlow! Compressing and Regularizing Deep Neural Networks vi | Table of Contents 117 125 136 Introduction Big data pushed the boundaries in 2016 It pushed the boundaries of tools, applications, and skill sets And it did so because it’s bigger, faster, more prevalent, and more prized than ever According to O’Reilly’s 2016 Data Science Salary Survey, the top tools used for data science continue to be SQL, Excel, R, and Python A common theme in recent tool-related blog posts on oreilly.com is the need for powerful storage and compute tools that can process high-volume, often streaming, data For example, Federico Castane‐ do’s blog post “Scalable Data Science with R” describes how scaling R using distributed frameworks—such as RHadoop and SparkR— can help solve the problem of storing massive data sets in RAM Focusing on storage, more organizations are looking to migrate their data, and storage and compute operations, from warehouses on proprietary software to managed services in the cloud There is, and will continue to be, a lot to talk about on this topic: building a data pipeline in the cloud, security and governance of data in the cloud, cluster-monitoring and tuning to optimize resources, and of course, the three providers that dominate this area—namely, Ama‐ zon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure In terms of techniques, machine learning and deep learning con‐ tinue to generate buzz in the industry The algorithms behind natu‐ ral language processing and image recognition, for example, are incredibly complex, and their utility, in the enterprise hasn’t been fully realized Until recently, machine learning and deep learning have been largely confined to the realm of research and academics We’re now seeing a surge of interest in organizations looking to vii apply these techniques to their business use case to achieve automa‐ ted, actionable insights Evangelos Simoudis discusses this in his O’Reilly blog post “Insightful applications: The next inflection in big data.” Accelerating this trend are open source tools, such as Tensor‐ Flow from the Google Brain Team, which put machine learning into the hands of any person or entity who wishes to learn about it We continue to see smartphones, sensors, online banking sites, cars, and even toys generating more data, of varied structure O’Reilly’s Big Data Market report found that a surprisingly high percentage of organizations’ big data budgets are spent on Internet-of-Thingsrelated initiatives More tools for fast, intelligent processing of realtime data are emerging (Apache Kudu and FiloDB, for example), and organizations across industries are looking to architect robust pipelines for real-time data processing Which components will allow them to efficiently store and analyze the rapid-fire data? Who will build and manage this technology stack? And, once it is con‐ structed, who will communicate the insights to upper management? These questions highlight another interesting trend we’re seeing— the need for cross-pollination of skills among technical and non‐ technical folks Engineers are seeking the analytical and communi‐ cation skills so common in data scientists and business analysts, and data scientists and business analysts are seeking the hard-core tech‐ nical skills possessed by engineers, programmers, and the like Data science continues to be a hot field and continues to attract a range of people—from IT specialists and programmers to business school graduates—looking to rebrand themselves as data science professionals In this context, we’re seeing tools push the boundaries of accessibility, applications push the boundaries of industry, and professionals push the boundaries of their skill sets In short, data science shows no sign of losing momentum In Big Data Now: 2016 Edition, we present a collection of some of the top blog posts written for oreilly.com in the past year, organized around six key themes: • Careers in data • Tools and architecture for big data • Intelligent real-time applications • Cloud infrastructure • Machine learning: models and training viii | Introduction • Deep learning and AI Let’s dive in! Introduction | ix ## } ## } ## attr { ## key: "value" ## value { ## tensor { ## dtype: DT_FLOAT ## tensor_shape { ## } ## float_val: 1.0 ## } ## } ## } TensorFlow uses protocol buffers internally (Protocol buffers are sort of like a Google-strength JSON.) Printing the node_def for the constant operation in the preceding code block shows what’s in Ten‐ sorFlow’s protocol buffer representation for the number People new to TensorFlow sometimes wonder why there’s all this fuss about making “TensorFlow versions” of things Why can’t we just use a normal Python variable without also defining a Tensor‐ Flow object? One of the TensorFlow tutorials has an explanation: To efficient numerical computing in Python, we typically use libraries like NumPy that expensive operations such as matrix multiplication outside Python, using highly efficient code imple‐ mented in another language Unfortunately, there can still be a lot of overhead from switching back to Python every operation This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data TensorFlow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead Instead of running a single expensive operation independently from Python, Tensor‐ Flow lets us describe a graph of interacting operations that run entirely outside Python This approach is similar to that used in Theano or Torch TensorFlow can a lot of great things, but it can only work with what’s been explicitly given to it This is true even for a single con‐ stant If we inspect our input_value, we see it is a constant 32-bit float tensor of no dimension: just one number: >>> input_value ## Hello, TensorFlow! | 129 Note that this doesn’t tell us what that number is To evaluate input_value and get a numerical value out, we need to create a “ses‐ sion” where graph operations can be evaluated and then explicitly ask to evaluate or “run” input_value (The session picks up the default graph by default.) >>> sess = tf.Session() >>> sess.run(input_value) ## 1.0 It may feel a little strange to “run” a constant But it isn’t so different from evaluating an expression as usual in Python; it’s just that Ten‐ sorFlow is managing its own space of things—the computational graph—and it has its own method of evaluation The Simplest TensorFlow Neuron Now that we have a session with a simple graph, let’s build a neuron with just one parameter, or weight Often, even simple neurons also have a bias term and a nonidentity activation function, but we’ll leave these out The neuron’s weight isn’t going to be constant; we expect it to change in order to learn based on the “true” input and output we use for training The weight will be a TensorFlow variable We’ll give that variable a starting value of 0.8: >>> weight = tf.Variable(0.8) You might expect that adding a variable would add one operation to the graph, but in fact that one line adds four operations We can check all the operation names: >>> for op in graph.get_operations(): print(op.name) ## Const ## Variable/initial_value ## Variable ## Variable/Assign ## Variable/read We won’t want to follow every operation individually for long, but it will be nice to see at least one that feels like a real computation: >>> output_value = weight * input_value Now there are six operations in the graph, and the last one is that multiplication: 130 | Chapter 6: Deep Learning and AI >>> op = graph.get_operations()[-1] >>> op.name ## 'mul' >>> for op_input in op.inputs: print(op_input) ## Tensor("Variable/read:0", shape=(), dtype=float32) ## Tensor("Const:0", shape=(), dtype=float32) This shows how the multiplication operation tracks where its inputs come from: they come from other operations in the graph To understand a whole graph, following references this way quickly becomes tedious for humans TensorBoard graph visualization is designed to help How we find out what the product is? We have to “run” the out put_value operation But that operation depends on a variable: weight We told TensorFlow that the initial value of weight should be 0.8, but the value hasn’t yet been set in the current session The tf.initialize_all_variables() function generates an operation which will initialize all our variables (in this case just one), and then we can run that operation: >>> init = tf.initialize_all_variables() >>> sess.run(init) The result of tf.initialize_all_variables() will include initial‐ izers for all the variables currently in the graph, so if you add more variables you’ll want to use tf.initialize_all_variables() again; a stale init wouldn’t include the new variables Now we’re ready to run the output_value operation: >>> sess.run(output_value) ## 0.80000001 Recall that it is 0.8 * 1.0 with 32-bit floats, and 32-bit floats have a hard time with 0.8; 0.80000001 is as close as they can get See Your Graph in TensorBoard Up to this point, the graph has been simple, but it would already be nice to see it represented in a diagram We’ll use TensorBoard to generate that diagram TensorBoard reads the name field that is stored inside each operation (quite distinct from Python variable names) We can use these TensorFlow names and switch to more conventional Python variable names Using tf.mul here is equiva‐ lent to our earlier use of just * for multiplication, but it lets us set the name for the operation: Hello, TensorFlow! | 131 >>> x = tf.constant(1.0, name='input') >>> w = tf.Variable(0.8, name='weight') >>> y = tf.mul(w, x, name='output') TensorBoard works by looking at a directory of output created from TensorFlow sessions We can write this output with a Summary Writer, and if we nothing aside from creating one with a graph, it will just write out that graph The first argument when creating the SummaryWriter is an output directory name, which will be created if it doesn’t exist: >>> summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph) Now, at the command line, we can start up TensorBoard: $ tensorboard logdir=log_simple_graph TensorBoard runs as a local web app, on port 6006 (“6006” is “goog” upside-down.) If you go in a browser to localhost:6006/#graphs, you should see a diagram of the graph you created in TensorFlow, which looks something like Figure 6-3 Figure 6-3 A TensorBoard visualization of the simplest TensorFlow neuron Making the Neuron Learn Now that we’ve built our neuron, how does it learn? We set up an input value of 1.0 Let’s say the correct output value is zero That is, we have a very simple “training set” of just one example with one feature, which has the value 1, and one label, which is zero We want the neuron to learn the function taking to 132 | Chapter 6: Deep Learning and AI Currently, the system takes the input and returns 0.8, which is not correct We need a way to measure how wrong the system is We’ll call that measure of wrongness the “loss” and give our system the goal of minimizing the loss If the loss can be negative, then mini‐ mizing it could be silly, so let’s make the loss the square of the differ‐ ence between the current output and the desired output: >>> y_ = tf.constant(0.0) >>> loss = (y - y_)**2 So far, nothing in the graph does any learning For that, we need an optimizer We’ll use a gradient descent optimizer so that we can update the weight based on the derivative of the loss The optimizer takes a learning rate to moderate the size of the updates, which we’ll set at 0.025: >>> optim = tf.train.GradientDescentOptimizer (learning_rate=0.025) The optimizer is remarkably clever It can automatically work out and apply the appropriate gradients through a whole network, car‐ rying out the backward step for learning Let’s see what the gradient looks like for our simple example: >>> grads_and_vars = optim.compute_gradients(loss) >>> sess.run(tf.initialize_all_variables()) >>> sess.run(grads_and_vars[1][0]) ## 1.6 Why is the value of the gradient 1.6? Our loss is error squared, and the derivative of that is two times the error Currently the system says 0.8 instead of 0, so the error is 0.8, and two times 0.8 is 1.6 It’s working! For more complex systems, it will be very nice indeed that Tensor‐ Flow calculates and then applies these gradients for us automatically Let’s apply the gradient, finishing the backpropagation: >>> sess.run(optim.apply_gradients(grads_and_vars)) >>> sess.run(w) ## 0.75999999 # about 0.76 The weight decreased by 0.04 because the optimizer subtracted the gradient times the learning rate, 1.6 * 0.025, pushing the weight in the right direction Instead of hand-holding the optimizer like this, we can make one operation that calculates and applies the gradients: the train_step: Hello, TensorFlow! | 133 >>> train_step = tf.train.GradientDescentOptimizer(0.025) minimize(loss) >>> for i in range(100): >>> sess.run(train_step) >>> >>> sess.run(y) ## 0.0044996012 Running the training step many times, the weight and the output value are now very close to zero The neuron has learned! Training diagnostics in TensorBoard We may be interested in what’s happening during training Say we want to follow what our system is predicting at every training step We could print from inside the training loop: >>> sess.run(tf.initialize_all_variables()) >>> for i in range(100): >>> print('before step {}, y is {}'.format(i, sess.run(y))) >>> sess.run(train_step) >>> ## before step 0, y is 0.800000011921 ## before step 1, y is 0.759999990463 ## ## before step 98, y is 0.00524811353534 ## before step 99, y is 0.00498570781201 This works, but there are some problems It’s hard to understand a list of numbers A plot would be better And even with only one value to monitor, there’s too much output to read We’re likely to want to monitor many things It would be nice to record everything in some organized way Luckily, the same system that we used earlier to visualize the graph also has just the mechanisms we need We instrument the computation graph by adding operations that summarize its state Here, we’ll create an operation that reports the current value of y, the neuron’s current output: >>> summary_y = tf.scalar_summary('output', y) When you run a summary operation, it returns a string of protocol buffer text that can be written to a log directory with a Summary Writer: >>> summary_writer = tf.train.SummaryWriter('log_simple_stats') >>> sess.run(tf.initialize_all_variables()) >>> for i in range(100): 134 | Chapter 6: Deep Learning and AI >>> >>> >>> >>> summary_str = sess.run(summary_y) summary_writer.add_summary(summary_str, i) sess.run(train_step) Now after running tensorboard logdir=log_simple_stats, you get an interactive plot at localhost:6006/#events (Figure 6-4) Figure 6-4 A TensorBoard visualization of a neuron’s output against training iteration number Flowing Onward Here’s a final version of the code It’s fairly minimal, with every part showing useful (and understandable) TensorFlow functionality: import tensorflow as tf x = tf.constant(1.0, name='input') w = tf.Variable(0.8, name='weight') y = tf.mul(w, x, name='output') y_ = tf.constant(0.0, name='correct_value') loss = tf.pow(y - y_, 2, name='loss') train_step = tf.train.GradientDescentOptimizer(0.025) minimize(loss) for value in [x, w, y, y_, loss]: tf.scalar_summary(value.op.name, value) summaries = tf.merge_all_summaries() sess = tf.Session() summary_writer = tf.train.SummaryWriter('log_simple_stats', sess.graph) sess.run(tf.initialize_all_variables()) for i in range(100): summary_writer.add_summary(sess.run(summaries), i) sess.run(train_step) Hello, TensorFlow! | 135 The example we just ran through is even simpler than the ones that inspired it in Michael Nielsen’s Neural Networks and Deep Learning For myself, seeing details like these helps with understanding and building more complex systems that use and extend from simple building blocks Part of the beauty of TensorFlow is how flexibly you can build complex systems from simpler components If you want to continue experimenting with TensorFlow, it might be fun to start making more interesting neurons, perhaps with different activation functions You could train with more interesting data You could add more neurons You could add more layers You could dive into more complex prebuilt models, or spend more time with Ten‐ sorFlow’s own tutorials and how-to guides Go for it! Compressing and Regularizing Deep Neural Networks By Song Han You can read this post on oreilly.com here Deep neural networks have evolved to be the state-of-the-art techni‐ que for machine-learning tasks ranging from computer vision and speech recognition to natural language processing However, deeplearning algorithms are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources To address this limitation, deep compression significantly reduces the computation and storage required by neural networks For example, for a convolutional neural network with fully connected layers, such as Alexnet and VGGnet, it can reduce the model size by 35x–49x Even for fully convolutional neural networks such as GoogleNet and SqueezeNet, deep compression can still reduce the model size by 10x Both scenarios results in no loss of prediction accuracy Current Training Methods Are Inadequate Compression without losing accuracy means there’s significant redundancy in the trained model, which shows the inadequacy of current training methods To address this, I’ve worked with Jeff Pool, of NVIDIA, Sharan Narang of Baidu, and Peter Vajda of Face‐ book to develop the dense-sparse-dense (DSD) training, a novel 136 | Chapter 6: Deep Learning and AI training method that first regularizes the model through sparsityconstrained optimization, and improves the prediction accuracy by recovering and retraining on pruned weights At test time, the final model produced by DSD training still has the same architecture and dimension as the original dense model, and DSD training doesn’t incur any inference overhead We experimented with DSD training on mainstream CNN/RNN/LSTMs for image classification, image caption, and speech recognition and found substantial performance improvements In this article, we first introduce deep compression, and then intro‐ duce dense-sparse-dense training Deep Compression The first step of deep compression is synaptic pruning The human brain inherently has the process of pruning 5x synapses are pruned away from infant age to adulthood Does a similar process occur in artificial neural networks? The answer is yes In early work, network pruning proved to be a valid way to reduce the network complexity and overfitting This method works on modern neural networks as well We start by learning the connectivity via normal network training Next, we prune the smallweight connections: all connections with weights below a threshold are removed from the network Finally, we retrain the network to learn the final weights for the remaining sparse connections Prun‐ ing reduced the number of parameters by 9x and 13x for AlexNet and the VGG-16 model, respectively The next step of deep compression is weight sharing We found neu‐ ral networks have a really high tolerance for low precision: aggres‐ sive approximation of the weight values does not hurt the prediction accuracy As shown in Figure 6-6, the blue weights are originally 2.09, 2.12, 1.92 and 1.87; by letting four of them share the same value, which is 2.00, the accuracy of the network can still be recov‐ ered Thus we can save very few weights, call it “codebook,” and let many other weights share the same weight, storing only the index to the codebook Compressing and Regularizing Deep Neural Networks | 137 Figure 6-5 Pruning a neural network Credit: Song Han The index could be represented with very few bits; for example, in Figure 6-6, there are four colors; thus only two bits are needed to represent a weight as opposed to 32 bits originally The codebook, on the other side, occupies negligible storage Our experiments found this kind of weight-sharing technique is better than linear quantization, with respect to the compression ratio and accuracy trade-off Figure 6-6 Training a weight-sharing neural network 138 | Chapter 6: Deep Learning and AI Figure 6-7 shows the overall result of deep compression Lenet-300-100 and Lenet-5 are evaluated on a MNIST data set, while AlexNet, VGGNet, GoogleNet, and SqueezeNet are evaluated on an ImageNet data set The compression ratio ranges from 10x to 49x—even for those fully convolutional neural networks like Goo‐ gleNet and SqueezeNet, deep compression can still compress it by an order of magnitude We highlight SqueezeNet, which has 50x fewer parameters than AlexNet but has the same accuracy, and can still be compressed by 10x, making it only 470 KB This makes it easy to fit in on-chip SRAM, which is both faster and more energyefficient to access than DRAM We have tried other compression methods such as low-rank approximation-based methods, but the compression ratio isn’t as high A complete discussion can be found in the “Deep Compres‐ sion” paper Figure 6-7 Results of deep compression DSD Training The fact that deep neural networks can be aggressively pruned and compressed means that our current training method has some limi‐ tation: it can not fully exploit the full capacity of the dense model to find the best local minima; yet a pruned, sparse model that has much fewer synapses can achieve the same accuracy This raises a question: can we achieve better accuracy by recovering those weights, and learn them again? Let’s make an analogy to training for track racing in the Olympics The coach will first train a runner on high-altitude mountains, where there are a lot of constraints: low oxygen, cold weather, etc The result is that when the runner returns to the plateau area again, his/her speed is increased Similar for neural networks, given the heavily constrained sparse training, the network performs as well as Compressing and Regularizing Deep Neural Networks | 139 the dense model; once you release the constraint, the model can work better Theoretically, the following factors contribute to the effectiveness of DSD training: Escape saddle point: one of the most profound difficulties of optimizing deep networks is the proliferation of saddle points DSD training overcomes saddle points by a pruning and redensing framework Pruning the converged model perturbs the learning dynamics and allows the network to jump away from saddle points, which gives the network a chance to converge at a better local or global minimum This idea is also similar to simulated annealing While simulated annealing randomly jumps with decreasing probability on the search graph, DSD deterministically deviates from the converged solution achieved in the first dense training phase by removing the small weights and enforcing a sparsity support Regularized and sparse training: the sparsity regularization in the sparse training step moves the optimization to a lowerdimensional space where the loss surface is smoother and tends to be more robust to noise More numerical experiments veri‐ fied that both sparse training and the final DSD reduce the var‐ iance and lead to lower error Robust reinitialization: weight initialization plays a big role in deep learning Conventional training has only one chance of initialization DSD gives the optimization a second (or more) chance during the training process to reinitialize from more robust sparse training solutions We re-dense the network from the sparse solution, which can be seen as a zero initialization for pruned weights Other initialization methods are also worth try‐ ing Break symmetry: The permutation symmetry of the hidden units makes the weights symmetrical, thus prone to coadaptation in training In DSD, pruning the weights breaks the symmetry of the hidden units associated with the weights, and the weights are asymmetrical in the final dense phase We examined several mainstream CNN/RNN/LSTM architectures on image classification, image caption, and speech recognition data sets, and found that this dense-sparse-dense training flow gives sig‐ nificant accuracy improvement Our DSD training employs a three- 140 | Chapter 6: Deep Learning and AI step process: dense, sparse, dense; each step is illustrated in Figure 6-8: Figure 6-8 Dense-sparse-dense training flow Initial dense training: the first D-step learns the connectivity via normal network training on the dense network Unlike con‐ ventional training, however, the goal of this D step is not to learn the final values of the weights; rather, we are learning which connections are important Sparse training: the S-step prunes the low-weight connections and retrains the sparse network We applied the same sparsity to all the layers in our experiments; thus there’s a single hyperpara‐ meter: the sparsity For each layer, we sort the parameters, and the smallest N*sparsity parameters are removed from the net‐ work, converting a dense network into a sparse network We found that a sparsity ratio of 50%–70% works very well Then, we retrain the sparse network, which can fully recover the model accuracy under the sparsity constraint Final dense training: the final D step recovers the pruned con‐ nections, making the network dense again These previously pruned connections are initialized to zero and retrained Restoring the pruned connections increases the dimensionality of the network, and more parameters make it easier for the net‐ work to slide down the saddle point to arrive at a better local minima We applied DSD training to different kinds of neural networks on data sets from different domains We found that DSD training improved the accuracy for all these networks compared to neural Compressing and Regularizing Deep Neural Networks | 141 networks that were not trained with DSD The neural networks are chosen from CNN, RNN, and LSTMs; the data sets are chosen from image classification, speech recognition, and caption generation The results are shown in Figure 6-9 DSD models are available to download at DSD Model Zoo Figure 6-9 DSD training improves the prediction accuracy Generating Image Descriptions We visualized the effect of DSD training on an image caption task (see Figure 6-10) We applied DSD to NeuralTalk, an LSTM for gen‐ erating image descriptions The baseline model fails to describe images 1, 4, and For example, in the first image, the baseline model mistakes the girl for a boy, and mistakes the girl’s hair for a rock wall; the sparse model can tell that it’s a girl in the image, and the DSD model can further identify the swing In the the second image, DSD training can tell that the player is try‐ ing to make a shot, rather than the baseline, which just says he’s playing with a ball It’s interesting to notice that the sparse model sometimes works better than the DSD model In the last image, the sparse model correctly captured the mud puddle, while the DSD model only captured the forest from the background The good per‐ formance of DSD training generalizes beyond these examples, and more image caption results generated by DSD training are provided in the appendix of this paper 142 | Chapter 6: Deep Learning and AI Figure 6-10 Visualization of DSD training improves the performance of image captioning Advantages of Sparsity Deep compression for compressing deep neural networks for smaller model size and DSD training for regularizing neural net‐ works are techniques that utilize sparsity and achieve a smaller size or higher prediction accuracy Apart from model size and prediction accuracy, we looked at two other dimensions that take advantage of sparsity: speed and energy efficiency, which is beyond the scope of this article Readers can refer to our paper “EIE: Efficient Inference Engine on Compressed Deep Neural Network” for further refer‐ ences Compressing and Regularizing Deep Neural Networks | 143 ... utility, rather than a technology stack (Figure 1-1) Focusing on building a utility forces you to select com‐ ponents based on the insights that the utility is meant to generate With utility... Tools and Architecture for Big Data Figure 2-2 All query times with relative speed factors All query times run on Spark 1.4/1.5 with local[1]; C* 2.1.6 with 512 MB row cache Credit: Evan Chan... and Architecture for Big Data • FiloDB has roughly the same filtering capabilities as Cassandra —by partition key and clustering key—but improvements to the partition-key filtering capabilities

Ngày đăng: 12/11/2019, 22:12

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN

w