IdeaArtistic style transfer may be defined as creating a stylized image from ax content image and a style image.. A neural algorithm of artistic style s [9] posits thecontent and style o
Trang 1ĐẠI HỌC BÁCH KHOA HÀ NỘI
TRƯỜNG CÔNG NGHỆ THÔNG TIN VÀ TRUYỀN THÔNG
──────── * ───────
BÁO CÁO MÔN HỌC
NHẬP MÔN TRÍ TUỆ NHÂN TẠO
Đề tài: Neural Style Transfer
Sinh viên thực hiện: Nhóm 12, Lớp CNTT -K67
1 Nguyễn Thị Mai Quyên - 20229034
HÀ NỘI, 06-2023
Trang 2A PREFACE
This project is a testament to the remarkable potential of AI, particularly in the field of neural style transfer By leveraging the power of deep learning algorithms and neural networks, we embark on a journey that seeks to bridge the gap between diverse artistic styles and create awe-inspiring visual transformations
With the advent of artificial intelligence and the rapid progress in machine learning, we now possess a tool that can harness the knowledge acquired from countless masterpieces and replicate the intricate details and distinctive characteristics
of diverse artistic styles The deep neural networks at the heart of this project enable us
to dissect and comprehend the underlying patterns, strokes, and textures that define a particular style
But this project goes beyond mere replication It strives to infuse the creations
of the past into the present, allowing the juxtaposition of artistic epochs and the fusion
of artistic sensibilities By seamlessly merging the content of one image with the style
of another, we enable the birth of harmonious hybrids that transcend time and challenge our perception of visual aesthetics
Through the following chapters, we will delve into the inner workings of neural style transfer, unraveling the intricate processes that enable this remarkable transformation We will explore the foundations of deep learning, the underlying mathematics, and the neural network architectures that serve as the backbone of our AI system
Ultimately, this project is a celebration of the symbiotic relationship between human creativity and artificial intelligence It aims to inspire and provoke, challenging
us to reimagine the possibilities of artistic expression and encouraging us to embrace the convergence of traditional and emerging mediums
Trang 3We would like to express our sincere thanks to Mr Pham Van Hai for his enthusiastic guidance, help and guidance in preparing this report
And we would also like to thank the family who have always encouraged, supported and attached friends, shared a lot of experiences and knowledge, and especially during the reporting period, to report can be completed most successfully However, due to limited knowledge, experience and ability, in the process of making the report, we still have many mistakes, hope to receive the contribution and sympathy of Mr Pham Van Hai
We sincerely thank you!
Trang 4A SMALL PREFACE 1
ACKNOWLEDGEMENTS 2
I PROBLEM ANALYSIS 4
1 Idea 4
2 A neural algorithm of artistic style [1] 4
2.1 Model Architecture 4
2.2 Cost function 6
2.2.1 Content loss function 6
2.2.2 Style loss function 6
3 Fast neural style transfer network [3] 7
II REFERENCE 10
Trang 5I PROBLEM ANALYSIS
1 Idea
Artistic style transfer may be defined as creating a stylized image from ax content image and a style image Typically, the content image is a photographc s c and the style image is a painting A neural algorithm of artistic style s [9] posits the content and style of an image may be defined as follows:
- Two images are similar in content if their high-level features as extracted by an image recognition system are close in Euclidean distance
- Two images are similar in style if their low-level features as extracted by an image recognition system share the same spatial statistics
2 A neural algorithm of artistic style [1]
2.1 Model Architecture
On the machine learning side, it has been shown that a trained classifier can be used as a feature extractor to drive texture synthesis and style transfer Gatys et al (2015a) uses the VGG-19 network (Simonyan & Zisserman, 2014) to extract features from a texture image and a synthesized texture The two sets of features are compared and the synthesized texture is modified by gradient descent so that the two sets of features are as close as possible Gatys et al (2015b) extends this idea to style transfer
by adding the constraint that the synthesized image also be close to a content image with respect to another set of features extracted by the trained VGG-19 classifier The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images
Trang 6Figure 1: General structure of the system Convolutional Neural Network (CNN) A given input image is represented as a set of filtered images at each
processing stage in the CNN While the number of different filters increases along the processing hierarchy, the size of the filtered images is reduced by some downsampling mechanism (e.g max-pooling) leading to a decrease in the total number of units per layer of the network Content Reconstructions We can visualise the information at different processing stages in the CNN by reconstructing the input image from only know- ing the network’s responses in a particular layer We reconstruct the input image from from layers ‘conv1 1’ ( ), ‘conv2 1’ ( ), ‘conv3 1’ ( ), ‘conv4 1’ ( ) and ‘conv5a b c d
1’ ( ) of the original l VGG-Network We find that reconstruction from lower layers ise
almost perfect (a,b,c) In higher layers of the network, detailed pixel information is lost while the hing-level content of the image is preserved (d,e) Style Reconstructions On top of the original CNN representations we built a new feature
space that captures the style of an input image The style representation computes correlations between the different features in different layers of the CNN We reconstruct the style of the input image from style representations built on different subsets of CNN layers ( ‘conv1 1’ ( ), ‘conv1 1’ and ‘conv2 1’ ( ), ‘conv1 1’,a b
‘conv2 1’ and ‘conv3 1’ ( ),‘conv1 1’, ‘conv2 1’, ‘conv3 1’ and ‘conv4 1’ ( ), ‘conv1c d
1’, ‘conv2 1’, ‘conv3 1’, ‘conv4 1’ and ‘conv5 1’ ( )) This creates images that matche
the style of a given image on an increasing scale while discarding information of the global arrangement of the scene
Trang 7The original paper recommend using pre-trained VGG19 network - a convolutional neural network that is shown to perform excellently on extracting features of images
2.2 Cost function
(1)
With:
G being the resulting image that we want to obtain from the process
G is generally initiated as a white noise image We hereafter could refer to
G as generated image G has the shape of the content image
2.2.1 Content loss function
Generally each layer in the network defines a non-linear filter bank whose complexity increases with the position of the layer in the network Hence a given input
C is encoded in each layer of the CNN by the filter responses to that image Say we use hidden layer l to compute content cost, let a[l][C] and a[l][G] be the activation of layer l on the content image and the gereneted image, respectively We define the squared-error loss between the two feature representations in layer [l]:
We use the feature space provided by the 16 convolutional and 5 pooling layers of the 19 layer VGG- Network That means we will compute (2) in 22 layers in total and sum the result
2.2.2 Style loss function
On top of the CNN responses in each layer of the network we built a style representation that computes the correlations between the different filter responses, where the expectation is taken over the spatial extend of the input image These feature correlations are given by the Gram matrix G Take a hidden layer [l], then G is of shape n[l]
c[l] x nc[l] , with n being the value of thec[l] third dimension of activation at layer [l]
In each hidden layer [l], We create two Gram matrices, one for style image and one for generated image, G[l](S) and G[l](G), respectively Gk,k’[l] measures how
Trang 8correlated are the activations in channel compared to activations in channel k’k
as following:
with ai,j,k[l] being the activation at point (i,j,k) of layer [l]
In each layer, Style loss function measures squared-error loss between G[l] (S) and G[l](G), which could be interpreted as how similar in texture are style image and generated one
Similar to which we has done in the content loss function part, when compute style loss, we also do not want to use only one hidden layer We use the feature space provided by the 16 convolutional and 5 pooling layers of the 19 layer VGG-Network That means we will compute (2) in 22 layers in total and sum the result
(3)
3 Fast neural style transfer network [3]
While very flexible, the algorithm proposed above is expensive to run due to the optimization loop being carried Optimizing an image or photograph to obey these constraints is computationally expensive and contains no learned representation for artistic style In August 2007, a group of researchers from Google Brain has presented a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair The paper introduced a new algorithm for fast, arbitrary artistic style transfer trained on 80,000 paintings that can operate in real time on never previously observed paintings
Trang 9Figure 2: Model architecture The style prediction network P (·) predicts an
embedding vector S_ from an input style image, which supplies a set of
normalization constants for the style transfer network The style transfer network
transforms the photograph into a stylized representation The content and style losses are derived from the distance in representational space of the VGG image classification network The style transfer network largely follows[2] and the style prediction network largely follows the Inception-v3 architecture
Previous work introduced a second network, a style transfer network T (·), to learn a transformation from the content image C to its artistically rendered version The style transfer network is a convolutional neural network formulated in the structure of an encoder/decoder The training objective is the combination of style loss and content loss obtained by replacing in Equation ( )x 1 with the network output T(c) The parameters of the style transfer network are trained by minimizing this objective using a corpus of photographic images as content The resulting network may artistically render an image dramatically faster, but a separate network must be learned for each painting style
Training a new network for each painting is wasteful because painting styles share common visual textures, color palettes and semantics for parsing the scene of
an image Building a style transfer network that shares its representation across many paintings would provide a rich vocabulary for representing any painting A simple trick recognized in [2] is to build a style transfer network as a typical encoder/decoder architecture but specialize the normalization parameters specific to each painting style This procedure, termed conditional instance normalization, proposes normalizing each unit’s activation z as
Trang 10where and are the mean and standard deviation across the spatial axes in anµ σ activation map and constitute a linear transformation that specify the learned mean and learned standard deviation of the unit This linear transformation is unique to each painting style S
Trang 11II REFERENCE
[1] L A Gatys, A S Ecker, and M Bethge A neural algorithm of artistic style arXiv preprint arXiv:1508.06576, 2015
[2] V Dumoulin, J Shlens, and M Kudlur A learned representation for artistic style International Conference of Learned Representations (ICLR), 2016
[3] Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens Exploring the structure of a real-time, arbitrary neural artistic stylization network Proceedings of the British Machine Vision Conference (BMVC), 2017