Design and programming of image processing for bacteria counting and classification for the scan machine

Trang 1 FACULTY FOR HIGH QUALITY TRAINING GRADUATION THESIS AUTOMATION AND CONTROL ENGINEERING TECHNOLOGY DESIGN AND PROGRAMMING OF IMAGE PROCESSING FOR BACTERIA COUNTING AND CLASSIFICAT

Problem Statement

Manually counting bacterial colonies involves visually distinguishing them by size and shape, often leading to human errors and potential research bias This labor-intensive process can be time-consuming, especially when samples contain a high number of colonies, making accurate counting challenging.

In laboratory settings, various techniques exist for culturing and counting bacteria, with hand-held colony counters commonly utilized in small labs, universities, and hospitals to assess bacterial quantities in samples While this method is cost-effective, it poses challenges in terms of time efficiency and long-term data management.

Interscience's colony counters effectively tackle issues of time efficiency and precision; however, their high price point may be a barrier for some users Furthermore, these machines lack an integrated barcode scanning feature, which limits their capabilities in data storage and retrieval.

I am committed to creating a colony-scanning device and software for the Windows operating system, designed to connect effortlessly to personal computers via USB This innovative device will include a barcode scanning feature, enabling users to easily retrieve data.

The objective (goals)

The development of a scanning machine focuses on converting bacterial sample dishes into image or PDF formats Alongside this, an application will be designed to facilitate the counting of colonies in petri dishes, ensuring that sample data is securely stored for easy future verification This innovative machine and application will seamlessly connect to personal computers or laptops.

Research Scope

In this topic, the problems below will be solved:

• Design a lighting system for the machine

• Research Image segmentation algorithms to figure out the best solution for colony counter situation

• Researching ways to tackle the challenge of training data with limited practical data using the Unet model and machine learning framework - Pytorch

• Researching an algorithm designed to effectively distinguish neighboring bacterial objects

• Use Visual Studio and Csharp language to build an application used for Windows Operating System

• Research connection for Csharp application and model in Python.

Research limit

Because of the limited time and the limited data from the support company, the following conditions have not been reached:

• Connecting the application with Wifi aims to feedback data (image, parameter, ) from clients who used our product to improve data

• Can not be used for any objects except Petri dish and limited colony types

• The application is able to be use in Windows Operating System only.

Research content

• Research, analysis, and evaluation of colony counters existing in the last 10 years

• Research devices such as led, camera, and lens, and design a light system

• Research and select a power supply for the light system

The article explores the integration of the Unet model and Watershed algorithms, referencing relevant scientific documents to enhance understanding It also discusses the research datasheet for Basler cameras and barcode scanners, emphasizing their application in image processing Additionally, the development of a Graphic User Interface (GUI) using C# is outlined, showcasing the programming skills necessary for effective user interaction Furthermore, the article highlights the importance of proficiency in both Python and C# to facilitate seamless communication between different programming environments.

• Research colony types and some culture mediums, petri types, and bacteriological culture methods at a basic level

• Research colony count methods and estimate some important parameters (CFU/ml, Area(%), …)

• Research how to create an auto-label file for the VIA label application which helps us save 40% in the total time of label data

• Display parameters, and labels of the colony, enabling clients to edit and classify colonies by size in the application

• Algorithm correction, optimal code, and application

Report structure

Introduce, and present reasons for choosing the topic, targets, research content, and research limits

This article explores key concepts in machine learning, including object detection, image segmentation, and cluster splitting, while outlining the machine learning workflow It also provides an overview of the various applications and libraries essential for developing graphical user interfaces Additionally, it discusses effective methods for facilitating communication between different programming languages.

Chapter 3: System Design and Implementation

This article details the hardware design and lighting diagram, alongside the software implementation process It includes an explanation of the software flowchart, which outlines the counting algorithm and the methods used for color and size checking Additionally, the article presents the outcomes of the Windows interface implementation, showcasing the effectiveness of the overall design and functionality.

Chapter 4: Results, observations, and Evaluations

This chapter will present the achieved results of the topic, followed by observations and evaluations

Chapter 5: Conclusion and future development

This chapter will present my conclusion on the research findings and the future development directions for the proposed topic

LITERATURE REVIEW

Machine Learning

Machine learning (ML), a subset of artificial intelligence (AI), empowers computers to enhance their performance by learning from sample data or previous experiences This technology allows machines to make predictions and decisions autonomously, without the need for explicit programming.

Machine learning problems can be categorized into two main types: prediction and classification Prediction tasks focus on estimating values, such as forecasting house and car prices, while classification tasks involve identifying and categorizing data, like handwriting and object recognition.

Workflow of every Machine Learning problem:

Data collection is essential for effective machine learning, requiring a reliable dataset of objects sourced from trusted providers or public datasets This process, which can take up to 70-80% of the total project time, is critical for ensuring accurate results and enhancing learning outcomes A high-quality dataset significantly influences the performance of the machine learning model, making it a foundational element for success.

Preprocessing is a critical phase in data analysis that encompasses data standardization, attribute selection, labeling, encoding, feature extraction, and data reduction This step is essential for achieving optimal results and is often the most time-consuming part of the process, as it correlates directly with the volume of collected data.

In the model training phase, we utilize preprocessed data to iteratively construct, train, and evaluate the model, focusing on optimizing accuracy This process is essential for identifying the most suitable model for our needs.

Evaluating the model: After training, the model is evaluated using test data A model with over 80% accuracy is considered good

Improvement: If the accuracy is unsatisfactory, the models are retrained iteratively until the desired accuracy is achieved, taking approximately 30% of the overall execution time

Figure 2-1 The workflow of a machine learning problem.

Image Segmentation

Figure 2-2 Problems and applications of computer vision

(Image source: https://phamdinhkhanh.github.io/2020/06/10/ImageSegmention.html)

Computer vision addresses a range of challenges, with image classification being a fundamental task that categorizes images Object detection enhances this by pinpointing the locations of identified objects Face detection and recognition play a critical role in attendance systems and enterprise management Additionally, content extraction is essential for efficient image search and retrieval, while Optical Character Recognition (OCR) facilitates the digitization of text and documents Image segmentation further refines the analysis of visual data.

-6- needed for a deeper understanding of image content beyond object detection and classification

In image segmentation, the primary objective is to pinpoint specific regions within an image that contain distinct objects, necessitating the labeling of each individual pixel This process demands a higher accuracy standard than object detection, as it requires precise prediction labels for every pixel in the image.

Figure 2-3 The output of Object Detection and Segmentation

(Image source: https://phamdinhkhanh.github.io/2020/06/10/ImageSegmention.html)

2.2.2 The input and output of the Image Segmentation problem

Image segmentation begins with an input image, which can be in formats like digital photographs or video frames This image is essential for dividing it into meaningful regions or objects, facilitating further analysis and processing.

Image segmentation produces a clear distinction between various regions or objects in an image, typically represented as a segmented image or binary mask In this output, each pixel is assigned a specific label or class that corresponds to its respective region or object This process enhances the understanding of the spatial layout and boundaries of the objects within the original image, making it crucial for various applications in computer vision and analysis.

Figure 2-4 a) Input image b) Output image (Image is copyright by LABone)

2.2.3 The Applications of Image Segmentation

Image Segmentation has numerous applications in medicine, autonomous driving, and satellite image processing

Image Segmentation algorithms play a crucial role in medicine by aiding doctors in the diagnosis of tumors from X-ray images These algorithms not only pinpoint the location of tumors but also offer valuable insights into their shape, enhancing the overall diagnostic process.

Autonomous driving relies on continuous perception, processing, and planning to navigate dynamic environments safely To ensure high accuracy in decision-making, these systems must effectively detect critical objects, including pedestrians, traffic lights, road signs, lane markings, and other vehicles.

Satellite image processing involves the continuous collection of surface images from various regions by orbiting satellites These images can be analyzed using an Image Segmentation model, which effectively categorizes the visuals into distinct elements such as roads, neighborhoods, bodies of water, and vegetation.

U-Net: Convolutional Networks for Biomedical Image Segmentation

U-Net (as the name suggests, with a U-shaped architecture) was originally designed for medical image segmentation So how does U-Net differ from other architectures like DeepLab and Mask R-CNN?

U-Net distinguishes itself from traditional end-to-end models by eliminating fully connected layers, which are commonly found in the final stages of deep learning architectures Instead, U-Net leverages the second half of its unique "U" structure to manage feature connections, allowing it to accommodate inputs of different sizes with enhanced flexibility This innovative design enhances the network's adaptability and efficiency in processing diverse data.

U-Net utilizes the Padding method to enhance image segmentation capabilities, ensuring comprehensive analysis This approach is crucial for effective segmentation, especially when GPU memory constraints may limit image resolution.

Figure 2-5 U-net architecture (example for 32x32 pixels in the lowest resolution)[15]

Each blue box represents a multi-channel feature map, with the number of channels indicated at the top and the x-y dimensions displayed at the lower left corner White boxes illustrate the copied feature maps, while arrows indicate the various operations performed.

Olaf Ronneberger and colleagues introduced the UNET architecture, specifically designed for biomedical image segmentation This model features two primary components: the encoder, or contraction path, which captures contextual information through a series of convolutional and max pooling layers.

The UNET architecture features a traditional stacking approach with a decoder, or symmetric expanding path, that enhances precise localization through transposed convolutions As a fully convolutional network (FCN), the UNET exclusively uses convolutional layers, omitting dense layers entirely This design enables the network to effectively process images of varying sizes.

Figure 2-6 A detailed explanation of the architecture

(Image source: https://medium.com/towards-data-science/understanding-semantic- segmentation-with-unet-6be4f42d4b47)

Convolution: The symbol for the convolution operation is denoted as (⊗), and the expression Y = X ⊗ W represents the convolution of X with W

Figure 2-7 The image represents the convolution of X with W

Matrix A has the same size as W, with x22 as its center We can calculate each element of Y as follow:

In the example above, we have y11=0 Perform similar computations for the remaining elements in Y

Padding: After each convolution operation, the size of the matrix Y is smaller than X

To ensure that the resulting matrix Y matches the dimensions of matrix X, it is essential to address the border elements One effective method to achieve this is by applying zero padding around the outer edges of matrix X.

In this situation, the element y11 after operating convolution is 6

In the stride process, we sequentially analyze each element in matrix X to generate matrix Y, which maintains the same dimensions as matrix X This method is known as a stride of 1, where we advance through the matrix one element at a time.

However, if the stride is s (s > 1), we only perform the convolution operation on elements x1+ik, 1+jk, where i and j are integers

Stride is commonly used to reduce the size of the matrix after the convolution operation

Figure 2-9 Computing the output values of a discrete convolution for N = 2, i1 = i2 =

The general formula for computing the convolution of a matrix X with size mn using a kernel of size kk, with stride s and padding p, is as follows:

Transposed convolution is a process that reverses standard convolution operations When applying a 3x3 kernel with padding set to 0 and a stride of 1 on an input image, the transposed convolution generates an output image by overlapping and summing square cells The rules for stride and padding remain consistent with those used in traditional convolution, ensuring a coherent transformation of the input image into the desired output format.

Figure 2-10 The image above has dimensions of 6x6, while the image below has dimensions of 4x4 The kernel has a size of 3x3

2.3.3 Understanding Convolution, Max Pooling, and Transposed Convolution

There are two inputs to a convolutional operation:

The input image is structured as an array or matrix, organized into columns (width), rows (height), and depth (channels) For grayscale images, the depth variable is set to 1, while for color images, it is typically 3 Therefore, the dimensions of the input can be represented as nin × nin × channels.

• K filter (also called kernels or feature extractors) each one of size (f × f ×channels), where f must not be a multiple of even numbers and typically 3 or 5

• The output of a convolutional operation is also a 3D array or matrix (also called an output image or feature map) and the output size is nout × nout × k

The relationship between nin and nout is as the equation (2.3):

• nin: number of input features

• nout: number of output features

Pooling functions are designed to minimize the feature map size, resulting in fewer parameters within the network Typically placed between convolutional layers, pooling layers effectively reduce data size while maintaining essential features This reduction in data size contributes to decreased computational operations within the model.

Figure 2-11 Max pooling with 2×2 filters and stride 2

The convolution and pooling operations play a vital role in reducing the size of an image through a process called down-sampling For instance, an image that initially measures 4x4 pixels is reduced to a 2x2 pixel size after pooling, effectively converting a high-resolution image into a lower-resolution one.

Figure 2-12 After applying a pooling layer 2 ×2

By downsampling, the model better understands “what” is present in the image, but it loses the information of “where” it is present [5]

Semantic segmentation provides a comprehensive image analysis by classifying each pixel, unlike traditional methods that only offer class labels or bounding box data Standard convolutional networks, which utilize pooling and dense layers, tend to discard crucial spatial information, focusing solely on object identification This loss of "where" information is inadequate for effective segmentation, highlighting the need for advanced techniques that preserve both "what" and "where" data.

"where" information, upsampling is necessary to convert a low-resolution image into a

-14- high-resolution one Transposed convolution is a commonly used technique for upsampling in state-of-the-art networks

Each output of a layer (excluding the input layer) is calculated using the formula:

In neural networks, a nonlinear activation function, denoted as f(l), is crucial for introducing complexity to the model When a layer employs a linear activation function, it can be merged with the subsequent layer, as the combination of linear functions results in another linear function.

In this project, we utilize the sigmoid function activation, a nonlinear function that transforms real-valued inputs into outputs ranging from [0, 1] This function is widely used to represent probabilities across various applications A key feature of the sigmoid function is its ability to produce only minor output changes in response to small input variations, allowing for smoother and more continuous output.

The formula for the Sigmoid function and its derivative is presented in equations (2.5) and (2.6):

Figure 2-13 Sigmoid function and its derivative

A loss function is essential for measuring the difference between predicted and actual outcomes, acting as a key metric for evaluating a model's predictive quality When predictions diverge significantly from the desired results, the loss function indicates a higher error value, while closer alignment between predictions and actual outcomes results in a lower value, signifying improved accuracy.

Segmentation using Morphological Watersheds

Watersheds can be visualized in three dimensions, combining two spatial coordinates with intensity This algorithm classifies all points into three distinct types, allowing for a comprehensive analysis of the data.

(1) Points belonging to a regional minimum

(2) Points at which a drop of water, if placed at the location of any of those points, would fall with certainty to a single minimum

(3) Points at which water would be equally likely to fall to more than one such minimum

A catchment basin, or watershed, is defined by points that satisfy certain conditions, specifically condition (b) In contrast, points that meet condition (c) create crest lines on the topographic surface, known as dividing lines or watershed lines.

Segmentation algorithms aim to identify watershed lines, which can be illustrated by envisioning holes at regional minima with water flooding the landscape from below As the water level rises, it fills distinct basins while dams are built to keep them separate Ultimately, only the tops of these dams are visible, indicating the watershed boundaries determined by the algorithm.

To provide a better visual understanding, we have Figure 2-17 which helps illustrate this algorithm more intuitively

Figure 2-17 Grayscale image b) Image plotted as a surface

The 2D graph illustrates the brightness of pixels along a line connecting the points of lowest brightness in the image This representation allows for a clear distinction among the three types of points identified in the analysis.

Figure 2-18 Three types of points in the Watersheds algorithm

The algorithm operates by treating concave regions as water basins We begin filling these basins from the lowest depth, ensuring that water from one basin does not merge with another by constructing a single-pixel dam This dam acts as a barrier, representing the critical point we aim to identify.

Figure 2-19 The Watersheds algorithm works

Camera Calibration

2.5.1 Camera calibration in computer vision

In industrial applications, converting pixel measurements from vision systems to real-world coordinates is essential, necessitating precise camera calibration This process involves understanding rigid transformations and the camera's intrinsic parameters, which vary based on the system's configuration—whether it’s a fixed camera, a robot arm-mounted camera, or a stereo/multi-camera setup Effective calibration identifies key intrinsic parameters, including pixel size, center of projection, principal length, and distortion characteristics, ensuring accurate metric value representation.

2.5.2 Using Zhang’s Technique to Find Intrinsic Parameters

Figure 2-20 Pinhole model of the camera [6]

To convert pixel measurements into real-world coordinates, it is necessary to have a model that captures the process of image formation This model establishes a mapping

The pinhole camera model effectively illustrates the relationship between 3D points in the real world and their corresponding 2D projections in images This model conceptualizes a camera as a chamber with a small aperture at the front, allowing light to enter and create a focused image.

The pinhole aperture acts as the optical center of the camera, allowing light rays to pass through and project an inverted image onto the image plane at the back of the chamber The distance from this optical center to the image plane, known as the principal distance or focal length, plays a crucial role in image formation.

A pinhole camera model uses two coordinate frames:

Figure 2-22 Computing the image coordinates of a real-world object by the similarity

We define it as follows:

• C: the origin (where the pinhole is), camera center

The ultimate model that converts a point's real-world coordinates to its corresponding pixel coordinates on the image plane (( , , ) X Y Z t ( , ) x y t ) can be represented as follows:

Equations (2.5) and (2.6) describe nonlinear transformations in Cartesian coordinates that can be described using homogeneous coordinates The perspective transformation can be represented as a linear matrix equation when utilizing homogeneous coordinates

Or, we can write shorter: x=hom ( − 1 M P hom( ))X (2.16)

• hom -1 : represents for Cartesian coordinate for a Homogeneous coordinate

• hom: represents for Homogeneous coordinate for a Cartesian coordinate

MP is the projection matrix and can be decomposed into two matrices as follows:

• M f represents the internals of the pinhole camera with focal length F

• M0 describes the transformation between the camera coordinates and the world coordinates

When a camera operates within its own non-canonical coordinate system, it captures 3D points that have experienced rigid body motion As a result, the projective transformation MP is applied to the modified points (X’, Y’, Z’, 1) t (hom(X')), which have been rotated and translated, instead of the original 3D points represented by hom(X).

1 1 x = hom − M P  hom( X ') = hom − M P  M rb  hom( ) X (2.18)

Mrb represents a rigid body motion in 3D:

Equation (2.10) motion can be written as follows depends on equation (2.9):

We combine M0 and Mrb into a single matrix as follows

A realistic camera features a lens with a focal length (F) and uses a light-sensitive sensor, like a CMOS sensor, to transform captured light into a digital signal However, this design presents several challenges, including potential manufacturing defects that can cause discrepancies in focal length between the x- and y-directions Additionally, achieving precise alignment between the lens center and the light sensor center is crucial Finally, the goal is to accurately map points from the camera coordinate system to the corresponding points in the image coordinate system.

The focal lengths in the x-direction and y-direction are measured in millimeters, centimeters, or inches To connect pixels to physical length units, we introduce a factor called s (pixel size), which indicates the number of pixels per unit length.

The final sensor coordinates u = (u, v) t are obtained from the normalized image coordinates x = (x, y) t as:

   (A is called the intrinsic matrix)

• s s x , y : sensor scales in x- and y-direction

• s  : the skewness (diagonal distortion) which is usually negligible or zero

Where (R | t) are the extrinsic parameters of the projection transformation

Finding this intrinsic matrix A and extrinsic matrix (R|t) is an important step in camera calibration However, the OpenCV library has some functions helping us to find these matrices easier.

Segmentation in RGB Vector Space

This project employs an algorithm to segment objects within a specified color range in RGB images, focusing on classifying the colors of various cultured bacteria based on their enzyme production By utilizing this algorithm, we can effectively compare the colors of the bacteria against predefined parameters in our software.

To achieve effective segmentation, we use a collection of representative color points that represent the desired hues, allowing us to estimate the "average" color through the RGB vector a The main goal of this segmentation process is to classify each RGB pixel in an image according to its inclusion within the defined color range.

To compare colors effectively, it's crucial to define a measure of similarity, with the Euclidean distance being one of the simplest options In the RGB color space, we denote an arbitrary point as z and consider it similar to another point, a, if the distance between them is less than a specified threshold (Do) The calculation of the Euclidean distance between z and a is straightforward and essential for this comparison.

R G B a a a : the RGB components of vectors a

R G B z z z : the RGB components of vectors b

The locus of points forms a solid sphere of radius, as shown in Figure 2-23 Points inside the sphere satisfy the specified color criterion, while those outside do not By assigning distinct colors, such as black and white, to these two groups of points, we create a binary segmented image.

Figure 2-23 Enclosing data regions for RGB vector segmentation in three approaches

Window API (Win32 API)

We utilize WPF for developing Windows workstation applications using C#, while Python is employed for executing machine learning algorithms like UNet due to its robust capabilities in this domain To facilitate communication between these two languages, we implement a protocol that enables Python to detect bacteria and transmit the resulting coordinates to the interface for visualization.

In Python and C#, developers offer libraries and functions that facilitate direct interaction with the operating system and hardware Windows applications leverage the Windows API to execute various operations on the computer.

Light in machine vision

2.8.1 The importance of light in machine vision tasks

Effective lighting is essential in machine vision, as it directly influences performance and the reliability of results Selecting the right lighting is crucial for achieving consistent outcomes, while inadequate lighting can lead to poor image quality and compromise the accuracy of software evaluations Therefore, investing in high-quality lighting is key to ensuring optimal performance in machine vision applications.

Figure 2-24 Comparison images of poor lighting (left) and good lighting (right) (Image source: https://www.xulyanhcongnghiep.com/anh-sang-trong-xu-ly-anh-cong- nghiep+20.html)

When choosing lighting, consider parameters such as desired outcome, application requirements, intensity, color temperature, beam angle, power consumption, lifetime, cost, environmental factors, and compatibility with the system

In machine vision, light is characterized by its wavelength measured in nanometers (nm), which can be either monochromatic, featuring a single color, or white light that encompasses all colors of the visible spectrum.

(Image source: https://www.xulyanhcongnghiep.com/anh-sang-trong-xu-ly-anh-cong- nghiep+20.html)

Visible light ranges from approximately 400 to 700 nm, positioned between infrared (longer wavelengths) and ultraviolet (shorter wavelengths) Specialized applications

-29- may use infrared (IR) or ultraviolet (UV) light Light interacts with materials through reflection, absorption, and transmission

Light refracts when it transitions between different mediums, altering its direction This phenomenon is inversely related to wavelength, meaning shorter wavelengths undergo greater refraction Consequently, shorter wavelength light is ideal for surface inspections due to its heightened scattering capabilities For particular uses, green light is preferred for detecting scratches, whereas longer wavelengths, such as red light, improve the translucency of materials.

Effective surface inspection requires front illumination to accurately identify flaws and features, with the lighting direction tailored to the specific surface characteristics For precise measurements of diameter and length, as well as for identifying through-holes, backside illumination is optimal In cases involving translucent materials or unclear conditions, a mixed lighting approach may be necessary to achieve reliable results.

In bright field illumination, light is partially obstructed or passes through translucent materials Backlighting technology is widely used, especially for high-precision measurement applications

Figure 2-26 Light diagram backlight, bright field (Image source: https://www.xulyanhcongnghiep.com/anh-sang-trong-xu-ly-anh- cong-nghiep+20.html)

Figure 2-27 Image of a bottle cap taken with a backlight

Dark field illumination utilizes backlighting to enhance the visibility of non-flat features in a sample, creating bright characteristics against a dark background This technique is accomplished by positioning a ring light or bar light behind the translucent sample, allowing only scattered light to be detected.

Figure 2-28 Light diagram backlight, dark field

For inspecting complex geometric objects and detecting surface features, front-angle illumination is recommended to eliminate reflections Dome lights are ideal for providing omnidirectional lighting in such applications

(https://www.xulyanhcongnghiep.com/anh-sang-trong-xu-ly-anh-cong- nghiep+20.html)

Figure 2-30 The image of the coin was taken with a dome light

((https://www.xulyanhcongnghiep.com/anh-sang-trong-xu-ly-anh-cong- nghiep+20.html)

Data Storage

SQLite is a lightweight relational database management system that functions as a software library, implementing a traditional SQL Database Engine without the need for a client-server architecture This design makes SQLite highly efficient and portable, contributing to its widespread use in desktop, mobile, and web applications due to its small footprint and versatility.

SQLite offers numerous advantages, including its operation without a client-server model, eliminating the need for configuration or installation It conveniently stores databases in a single file, ensuring simplicity and ease of use Supporting most SQL92 standard query features, SQLite is incredibly lightweight, with a full-featured version under 500kb, and can be even smaller by excluding certain functionalities Data operations in SQLite are faster than those in client-server database management systems, and it adheres to ACID properties (Atomicity, Consistency, Isolation, Durability) Its compactness, rapid data retrieval, and user-friendly nature make SQLite a popular choice for embedding in various projects.

While SQLite offers several benefits, it has limitations compared to other database management systems Its coarse-grained locking mechanism allows multiple concurrent readers but restricts writing to a single user at any given moment Consequently, SQLite may not be the best option for managing large datasets or meeting continuous high-throughput processing demands.

XML, or eXtensible Markup Language, is an extended markup language designed for data transmission and description Its primary purpose is to facilitate data sharing across various platforms and connect systems via the Internet One key advantage of XML is its independence, as it presents data in a text-based format easily understood by most applications XML files allow for quick reading and parsing of data, making them ideal for data exchange in scenarios like Remote Procedure Calls for web services However, users should be aware of XML's drawbacks, including an error rate of approximately 5-7% in data transmission, which can impact the exchange of critical information.

Figure 2-31 Diagram illustrating syntax rules in an XML document

(Image source: https://topdev.vn/blog/xml-la-gi/)

SYSTEM DESIGN AND IMPLEMENTATION

Hardware Design

Hardware design requirements description as follows:

• Easy connectivity with the user's computer

• LED indicator for device status notification

• Optimal lighting system (no shadows, no glare)

• Lighting technology: white LED dome

• Lighting system: the combination of top and bottom lighting with a white or black background

• The scanned object must produce sharp, non-distorted images

• Minimum image resolution of 5MP

3.1.2 Design of an image acquisition system

For this project, a high-resolution color camera is essential to capture sharp images and accurately classify bacteria by their colors As our focus is solely on counting bacteria from images of each Petri dish, a high frame rate (FPS) is not required.

Figure 3-2 The Basler acA5472-17uc USB 3.0 camera

Table 3-1 Table of sensor information

Shutter Rolling Shutter Sensor Format 1’’

Sensor Size 13.1mm × 8.8mm Resolution (H×V) 5472px × 3648px

Table 3-2 Table of camera data

Pixel Bit Depth 10/12 bits bits

Synchronization Free run Hardware Trigger

Software Trigger Exposure Control Programmable via the camera API Hardware Trigger

Power Supply Via USB 3.0 interface

To enhance hardware height optimization, it is essential to select a compatible lens with a short focal length In this case, we will opt for an 8mm focal length lens featuring a C-mount, ensuring perfect compatibility with the camera.

Figure 3-3 MP2 Machine Vision Series M0814-MP2

Figure 3-4 Computar lens M0814-MP2 dimension

Table 3-3 Table of lens information

Focal Length IMX183 Max Aperture 1:1.4 Max Image Format 8.8mm × 6.6mm

3.1.3.1 Diagram of the lighting system

For this project, we will utilize a 100mm dome light with a white light source, suitable for counting bacteria in a 90mm petri dish, as the diameter of the petri dish typically ranges from 90mm to 150mm.

Figure 3-7 Lighting dimension of the dome light

Table 3-4 Description Table of dome light IDS5-00-100-1-W-24V

Table 3-5 Technical information table of dome light IDS5-00-100-1-W-24V

To enhance image quality and visibility when using a petri dish, we will utilize a backlight source for illumination This method effectively reduces contrast and minimizes unwanted glare, ensuring that the captured images are clear and high-quality.

Figure 3-9 Lighting dimension of the backlight

Table 3-6 Description Table of backlight BHS4-00-100-X-W-24V

Table 3-7 Technical information table of backlight BHS4-00-100-X-W-24V

To power the two lights, we require a power supply that converts 220V to 24V, as both lights operate on a 24V input voltage We will use a power adapter for this conversion and choose a controlled power supply to enable adjustable brightness according to our preferences.

Figure 3-10 ANG SERIES - Light controller a) b) c)

Figure 3-11 Dimension of the adapter a) Front view b) Side view c) Rear view

Table 3-8 Description Table of light controller ANG-2000-CH2-24V-A1

Product code ANG-2000-CH2-24V-A1 Intensity VR analog control (adjustable) Operation mode Constant

Table 3-9 Technical information table of light controller ANG-2000-CH2-24V-A1

Input Voltage 110-240VAC Output Voltage 24V

Software Design

3.2.1 Program design of the software

• Zooming capability for image magnification

• Timing: up to 1000 bacteria counts per second

• Minimum bacteria counting size: 0.5mm

• Counting modes: automatic and manual control

• Counting on circular petri dishes of 90mm and 100mm

• Counting on surface agar plates, colored bacterial culture media

• Automatic separation of bacterial clusters

• Differentiation of bacterial colors on different types of colored culture media

• Manual control for adding or subtracting bacteria counts

• Data export to printable reports, PDF, images (jpg, png, bmp), Excel

• Source tracking: image/sample numbers/comments/date/time

• Ability to scan barcodes for each sample

• Allows opening images from the computer or capturing from a camera

• Enables creation and storage of parameters for each type of bacteria

We will utilize Visual Studio to develop bacteria counting software with a user interface based on the WPF platform, which is part of the NET framework for Windows WPF, a graphical user interface framework, serves as an enhanced successor to WinForms, offering numerous advantages for modern application development.

• It provides a unified platform for building user interfaces

• It enables easy collaboration between programmers and interface designers

• It offers a common technology for developing user interfaces on both Windows and web browsers

Figure 3-12 Design screen for input user name

Figure 3-13 Design screen for scan barcode

Figure 3-14 Design screen for scan barcode

Figure 3-15 Design the main screen for counting a) b)

Figure 3-16 Preset parameter window a) Plating tab b) Display Options tab

Figure 3-17 Preset parameter tab (color)

Figure 3-18 Design screen design helps users edit colonies manually

F igure 3-19 Design a screen for displaying the history of session changes

Figure 3-20 Design screen for displaying details the history of session changes

Figure 3-21 Design screen for displaying app activity history.

System construction

3.3.1 Prepare the dataset and train the model

We will utilize VGG Image Annotator (VIA) software for data labeling, which operates directly in a web browser without the need for installation This lightweight application, contained within a single HTML page of under 400 Kilobytes, can function offline in most modern web browsers VIA enables users to label segmented data using various shapes, including polygons, circles, and rectangles.

Figure 3-22 The process of labeling data using VIA

Figure 3-23 An image with a label completed

Manually labeling an entire dataset can be a lengthy process To expedite this, we can utilize image processing techniques for automatic data labeling Understanding the JSON file format used by VIA software is essential for implementation The auto-labeling process is illustrated in the flowchart in Figure 3-24.

Figure 3-24 Export JSON for auto-label flowchart

Figure 3-25 Find the mask of the petri function

Figure 3-26 Find the mask of colony function

After automatic labeling, we generate a set of masks for the data; however, these masks are not immediately suitable for training due to potential inaccuracies This process primarily reduces labeling time, necessitating manual verification to enhance accuracy The image below illustrates a comparison between the results of automatic labeling and those refined through manual verification.

Figure 3-27 a) Auto label image b) After manual checking

After the labels have been edited and finalized, we need to create a corresponding mask for each image The process of creating a mask unfolds in Figure 3-28 a) b)

Figure 3-28 a) Labeled image b) Mask image

We organize the original image data into three folders: train (70%), val (20%), and test (10%) Each folder is paired with corresponding mask folders named train_anno, val_anno, and test_anno It is essential that the mask names align with the names of the images in their respective original image folders.

The dataset consists of three distinct parts:

The training dataset consists of 70 labeled images, including augmented images generated from the model, featuring photographs of petri dishes with cultured bacterial colonies incubated at 37°C for at least 24 hours, presenting a semantic segmentation problem with two distinct datasets, namely petri dishes and bacterial colonies.

-51- colonies), and each data set has two labels (background, object) Therefore, each pixel has a corresponding label (background: 0, petri: 50, colonies: 250)

• Valid data: The validation data consists of 19 pairs of input and output images, following the same structure as the train data

• Test data: The test data consists of 10 pairs of input and output images, following the same structure as the train data

To effectively analyze the dataset, we will visualize images to ensure they align accurately with their corresponding masks If discrepancies arise, we will revisit step 2 for necessary adjustments The visualization will be conducted using the matplotlib library, showcasing the original images and masks as illustrated in Figure 3-30.

Figure 3-30 Visualization of petri data

Figure 3-31 Visualization of colony data

To enhance the dataset for our model, we implement augmentation techniques that create multiple new images with varying brightness and sharpness This approach allows us to boost the quantity of data without incurring extra labeling time The resulting modified images are illustrated in Figure 3-32.

Figure 3-32 Augmentation of petri data

Figure 3-33 Augmentation of colony data

The data training process consists of two main components: one for petri dish detection and another for bacteria detection We will utilize the flowchart in Figure 3-34 to guide us through these tasks effectively.

Figure 3-35 Get items for training petri (left) and colonies (right)

The question arises as to why a single model isn't trained to detect both petri dishes and colonies simultaneously The answer is rooted in experimentation, which reveals that the bacteria detection model excels with grayscale input, while the petri dish detection model achieves better results with RGB color input.

The overall dataset used in this project is divided according to the percentage distribution chart in Figure 3-36 :

Figure 3-36 Percentage of Dataset with 223 input

After training, we will generate distinct weight files (.pth) for both the petri dish and the colonies Using these files, we will then carry out the detection of dishes and bacteria through a series of defined steps.

• Sequentially load the weight files of the models using the PyTorch library (torch.load("file directory"))

To enhance petri dish detection, we begin by reading and preprocessing the input image After applying post-processing to the model's output, we generate a binary image where the background is represented by a value of 0, while the petri dish area is indicated by a value of 255 Subsequently, we crop the region containing the petri dish from the original image, effectively reducing its size This reduction is crucial as it improves the accuracy of the subsequent bacteria detection process.

After isolating the image of the petri dish, we will initiate the bacteria detection process The outcome will include a post-processed binary image, where the background is represented by a pixel value of 0, and the areas containing bacteria are indicated by a pixel value of 255.

We have identified the region with bacteria, but external noise outside the petri dish can lead to misidentification To address this issue, we apply the AND operation between the petri mask and colonies mask, effectively eliminating any interfering factors.

Figure 3-37 Distinguish between the detecting area and the noise area

Details of the steps to perform the identification are presented in the flowchart in Figure 3-38

To facilitate communication between Python and C#, a suitable data type is essential for transmitting and receiving information In this project, we focus on sharing the coordinates of bacteria, including centroid, bounding box, and contour coordinates, which are all converted into Python lists JSON (JavaScript Object Notation) emerges as an ideal data type, allowing for the efficient transmission of this information in a single format The use of JSON provides several advantages for this data exchange process.

JSON for communication are presented below:

• Simple format: JSON is designed to be easily readable and writable for humans

It is a simple and easy-to-understand data format

• Multi-language support: JSON is supported by the most popular programming languages such as JavaScript, Python, Ruby, PHP, and many others

• Small size: JSON has a smaller size compared to some other data formats, making it easier for data transmission and storage

• Structured data: JSON uses a key-value, array, and object data structure, allowing for processing complex data

• Support for interacting with APIs: JSON is widely used for data transmission between applications and interacting with Application Programming Interfaces (APIs)

Both Python and C# programming languages have libraries that support working with JSON strings and interacting with APIs In this project, the following libraries are used:

Table 3-10 Table of libraries used

API win32file win32pipe

• Request is a string (“send_auto_mode”)

• Response is a JSON string containing information of detection (list of center points, bounding boxes, borderlines)

3.3.4 Separation and counting of colonies

3.3.4.1 Separation of colonies and counting the number of whole colonies

The input to the bacteria segmentation algorithm is a binary mask (detect output) We need to separate the closely located bacteria (enclosed in red regions) as shown in Figure 3-40 :

Figure 3-40 Output image of the detection step

The goal of this step is to isolate clustered bacteria into individual cells For the bacteria depicted in image X, the expected outcome is four distinct bacteria Failing to separate these clusters significantly increases the risk of inaccurate results.

Figure 3-41 a) Original image b) Target output of Unet c) Wrong count result d) Correct count result

As described in section 2.3, we will use the Watershed algorithm to separate the bacteria The detailed steps of the algorithm are outlined in the flowchart in Figure 3-42

Figure 3-42 Flowchart of separation and count of total colonies

The hardware of the scanning machine for bacterial colony counting

Figure 4-1 The hardware of the scanning machine.

The images of bacteria collected from the scanning machine

Successfully implemented the design of the lighting system and obtained the resulting images of bacteria from the scanning machine as shown in Figure 4-2

Figure 4-2 Coliform bacteria (Images are copyright by SGS)

Figure 4-3 Enterococcus bacteria (Images are copyright by SGS)

Figure 4-4 Enterococcus bacteria (Images are copyright by SGS).

Results of training the petri dish recognition model

After training, we obtain the model's performance metrics visualized through the training iterations in Figure 4-5

Figure 4-5 IoU score plot for training petri

Figure 4-6 Dice loss plot for training petri

The best result we achieved was at the 29th training iteration

Figure 4-7 The metrics at the 29th training iteration

The results obtained on the test dataset:

The evaluation results on the test set of 19 images:

Figure 4-9 The evaluation results on the test set of 19 images

Figure 4-10 The IoU results on the 19 test images

Results of training the colonies recognition model

After training, we obtain the model parameters through visualized training loops in Figure 4-11 :

Figure 4-11 IoU Score Plot for training colony

Figure 4-12 Dice loss for training colony

The best result we achieved was at the 75th training iteration:

Figure 4-13 The metrics at the 75th training iteration

The results obtained on the test dataset are as Figure 4-14 , Figure 4-15 , Figure 4-16 , Figure 4-17 :

Figure 4-14 Predict result (E.coli - Mueller_HintonAgai IoU: 0.913)

Figure 4-15 Predict result (Shigella flexneri IoU: 0.788)

Figure 4-16 Predict result E.coli - Maconkey Agar IoU: 0.936)

Figure 4-17 Predict result E.coli - Maconkey Agar IoU: 0.892)

The evaluation results on the test set of 19 images:

Figure 4-18 The evaluation results on the test set of 19 images

The result of the bacterial colony separation algorithm

After detecting bacterial colonies and generating a binary mask image, we utilize the Watershed algorithm to differentiate closely situated colonies The results for each cluster of small bacterial colonies are presented, as shown in Figure 4-19.

Figure 4-19 The results for each cluster of small bacterial colonies.

The result of converting from pixel coordinates to real-world coordinates

The calibration process from pixel coordinates to real-world coordinates was achieved using Zhang's Technique to determine intrinsic parameters This experiment utilized a caro chessboard with a square size of 15mm, yielding significant results in the calibration accuracy.

Figure 4-20 The chessboard is numbered, and each square has a size of 15mm

Table 4-1 Table of distance measurements in real-world coordinates and their corresponding errors

Distance (mm) Error (mm) Distance (mm) Error (mm)

Figure 4-21 The results of bacteria detection (Images are copyrighted by SGS)

The sizes of the bacteria measured within the red region in the above image are represented in Table 4-2

Table 4-2 Table of the sizes of the bacteria measured within the red

Colony index Diameter (mm) Colony index Diameter (mm)

Displaying the results on the Windows interface

Figure 4-23 Username input in screen result

Figure 4-24 Barcode scan results on the app

Counting colony-forming units of E coli bacteria on Maconkey Agar plates using the TotalCount parameter (counting total number of colonies regardless of size and color)

Figure 4-25 The result of counting all colonies on the app

The information displayed on the interface is described in Table 4-3

Table 4-3 Table of the information displayed on the interface

1 Barcode ID of this petri

2 Preset parameter set by the user

3 Set colonies size, the colonies have the size smaller than this value will be counted

4 Dilution set by the user

The counting of bacterial colonies on MacConkey Agar plates focuses on identifying E coli, utilizing the parameter MCKA/Ecoli This method involves quantifying the total number of colonies by analyzing the distinct color of E coli colonies that develop on the MacConkey Agar medium.

Figure 4-26 The result of counting bacterial colonies on Maconkey Agar plate with color verification a) b)

Figure 4-27 a) MCKA/Ecoli Parameter (Plating) b) MCKA/Ecoli Parameter (Display Options)

Figure 4-28 MCKA/Ecoli Parameter (Color)

Figure 4-29 The result of the edit screen and the manual bacterial colony addition feature

Figure 4-30 The result of the edit screen and the manual bacterial colony deletion feature

Figure 4-31 History of session changes result

Figure 4-32 Details history of session changes result

Evaluate

The model performs basic functions including:

• Ability to zoom in on images

• Minimum bacteria counting size: 0.5mm

• Counting mode: automatic and manual control

• Counting on circular Petri dishes of 90mm and 100mm

• Counting on Petri dishes, surface culture dishes, and colored bacterial culture media

• Automatic separation of bacterial clusters

• Differentiating bacterial colors on different types of colored culture media

• Exporting data to print reports, PDFs, images (jpg, png, bmp), and Excel

• Tracking origin: image/sample number/comments/date/time

• Ability to scan barcodes for each sample

• Allowing image import from device or camera capture

• Enabling the creation and storage of parameters for each type of bacteria

The algorithm's performance is constrained by the limited image data and the small number of bacterial samples collected To run the algorithm efficiently, Machine Learning and Deep Learning libraries are essential, particularly on GPUs, as they enhance processing speed Consequently, devices lacking a graphics card experience a significant slowdown, with the counting algorithm operating approximately 4-5 times slower However, these challenges are recognized and will be addressed for improvement in future project phases.

CONCLUSION

Conclusion

The research project has progressed to the experimental investigation stage, successfully meeting its basic objectives However, due to time constraints and a limited number of bacterial samples, results for all bacterial colony types remain incomplete Furthermore, the algorithm's accuracy is still insufficient, and the interface does not fully support all plate samples.

Future development

Areas for improvement and development:

• Collecting additional and improving data for bacterial colony detection on multiple samples

• Enhancing and adding features to the Windows app

• Seeking solutions to reduce the detection speed for the model

• Integrating a robotic arm to automate barcode scanning and bacterial colony counting

• Uploading the collected data to a cloud platform, allowing all users within a business to simultaneously observe and enabling developers to easily gather more data for new samples

• Designing a lighting circuit to reduce the cost of the product

[1] Vietnam Standard TCVN 11039 – 2015 on Food aditive - Microbiological analyses

[2] Hỹseyin Ateş and ệmer Nezih Gerek (2009, September 14-16) An Image-

Processing Based Automated Bacteria Colony Counter Department of Electrical and

Electronics Engineering, Anadolu University 26555 Eskisehir Turkey

[3] Alessandro Ferraria , Stefano Lombardia, Alberto Signoronia et al(2016, July

8) Bacterial Colony Counting with Convolutional Neural Networks in Digital Microbiology Imaging Information Engineering Dept., University of Brescia,

Brescia (Italy) Futura Science Park, Copan Italia S.p.A., Brescia (Italy)

[4] Gabriel M ALVES and Paulo E CRUVINEL (2016, June 30) Customized

Computer Vision and Sensor System for Colony Recognition and Live Bacteria Counting in Agriculture Embrapa Instrumentaỗóo, Rua XV de Novembro 1452, Sóo

Carlos, SP, 13560-970, Brazil, Universidade Federal de São Carlos, Rod Washington Luís, s/n, São Carlos, SP, 13565-905, Brazil

[5] Harshall Lamba (2019, Feb 17) Understanding Convolution, Max Pooling, and Transposed Convolution https://towardsdatascience.com/understanding- semantic- segmentation- with- unet- 6be4f42d4b47

[6] Analytics Vidhya (2020, Sep 15) Camera Calibration - Theory and Implementation https://medium.com/analytics-vidhya/camera-calibration-theory- and-implementation-b253dad449fb

[7] Zhengyou Zhang (2008, Aug 13) A Flexible New Technique for Camera Calibration Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399,

[8] Wilhelm Burger (2016, May 16) Zhang’s Camera Calibration Algorithm: In-

Depth Tutorial and Implementation University of Applied Sciences Upper Austria,

[9] Rafael C Gonzalez and Richard E Woods (2008) Color Image Processing

Digital Image Processing, third edition, Pearson Education Inc, Upper Saddle River,

[10] Shruti Jadon (2020, September 3) A survey of loss functions for semantic

Yao Yuhua and colleagues (2012) from the School of Computer Science at Beijing Institute of Technology introduced the Sigmoid Gradient Vector Flow, a novel approach for medical image segmentation This method enhances the accuracy of segmenting medical images, which is crucial for effective diagnosis and treatment planning Their research contributes significantly to the field of medical imaging, offering a promising tool for improving image analysis techniques.

[12] Karen Simonyan and Andrew Zisserman (2015, April 10) Very deep convolutional networks for large-scale image recognition Visual Geometry Group,

Department of Engineering Science, University of Oxford

[13] Sebastian Ruder (2017, Jun 15) An overview of gradient descent optimization algorithms Insight Centre for Data Analytics, NUI Galway Aylien Ltd., Dublin

[14] Diederik P Kingma and Jimmy Lei Ba Adam (2015) A Method for Stochastic Optimization International Conference on Learning Representations

In 2015, Olaf Ronneberger, Philipp Fischer, and Thomas Brox from the University of Freiburg introduced U-Net, a groundbreaking convolutional network designed specifically for biomedical image segmentation This innovative framework has significantly advanced the field of medical imaging by enhancing the accuracy and efficiency of segmenting complex biological structures.

[16] Vincent Dumoulin and Francesco Visin (2018, January 12) A guide to convolution arithmetic for deep learning FMILA, Université de Montréal

Switch on the power by turning the switch to the right side of the device to supply power to the lighting

After the system is powered, the LED light on the switch will illuminate Supply power to the adapter to adjust the brightness of the two lights

Connect the USB port from the device to the computer (connect to the USB 3.0 port) Start application

Screen after launching the application

To proceed, users must enter a valid username in the User Name field, ensuring it is at least 3 characters long and free of special characters Once the username is validated, users can click OK to continue to the next steps.

After selecting OK, the user will be allowed to choose between two modes: Use barcode device and Open location file

1 If Open location file is selected, the application will switch to the main screen as shown in Figure X, allowing the user to open an image

2 If Use barcode device is selected, the application will switch to the barcode scanning screen for the user to perform barcode scanning On this screen, the user selects Click here to find a scanner to check if the computer is connected to the scanning device If connected, the screen will display a message indicating that the device has been found Then, the user proceeds to select the COM port for scanning, puts the barcode into the scanner, and the result will be displayed on the screen Press

Next to go to the main screen as shown in Figure below:

Open File Open an image that user wants to process

Save Save current sample information in a folder

Export PDF Export the current sample as a PDF file

Export Excel Export the current sample as an Excel file

Add Section Creating exclusion polygon regions

To create a polygon area, click on each point in the image and press the Enter key to finalize the shape If you wish to delete the polygon, simply select the area and press the Delete key.

Database Open the window database to access data

History View all history of app activities

Zoom in Zoom in image view

Zoom out Zoom out the image view

Fit to screen Remove zoom-in/zoom-out functionality and set the image to its initial state

Eye Visible / Hidden the marker of colonies

Delete the current preset parameter

Change the display option (marker style, marker color) parameter in the current preset parameter

Change the color parameter in the current preset parameter

Edit Turn on manual counting mode

Use: Click the left mouse on the image view to add 1 colony Click the right mouse on the colony marker area to remove this colony

Validate Insert the current sample into the database

The information entered by the user is listed in the table below:

Adjust the count result limit If the count result is lower or exceeds the set limit, the displayed count result will be red

Input sample number if sample without scan code

-90- concentration) This information will not affect the counting result

Disk diameter Input disk diameter

Volume A number of ml of colony culture solution This information is used to calculate CFU/ml

Use the features above to count and save bacterial colonies into the database

Users can easily view and edit samples in the database by double-clicking on the corresponding data row This action opens an editing and history viewing screen, as illustrated in the figure below Once the necessary changes have been made, users can click the Save button to update the information.

Tiêu đề	Design and Programming of Image Processing for Bacteria Counting and Classification for the Scan Machine
Tác giả	Tạ Yến Nhi
Người hướng dẫn	TS. Nguyễn Văn Thái
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Automation and Control Engineering Technology
Thể loại	graduation thesis
Năm xuất bản	2023
Thành phố	Ho Chi Minh City

Định dạng
Số trang	110
Dung lượng	9,62 MB