Electrical and Electronic Engineering - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 226
  • Item
    Thumbnail Image
    Steady-state multi-energy flow problem in coupled energy networks
    Mohammadi, Mohammad ( 2023-07)
    Energy systems worldwide are experiencing a rapid transition towards a low-carbon future, calling for greater energy efficiency and flexibility in utilising energy networks. This transition is accompanied by the growing adoption of multi-energy technologies like combined heat and power (CHP), power-to-heat, and power-to-gas units, leading to rising interdependencies between different energy networks, particularly in district and energy community applications. Hence, more effective utilisation of coupled energy networks (CEN) can bring several efficiency and flexibility benefits. Developing efficient energy flow models for CEN analysis is thus key to fully unlock these inherent benefits. The steady-state multi-energy flow (MEF) problem is the cornerstone of CEN analysis and lays the foundation for optimal operation and flexibility studies. This thesis presents a systematic evaluation of the MEF problem for steady-state analysis of CEN, exemplified in the case of coupled electricity, heat, and gas networks, by highlighting three main principles, namely formulations, coupling strategies, and solution techniques. Regarding formulations, a MEF framework is developed to incorporate various formulations for steady-state analysis of CEN, while demonstrating their impact on the convergence and computational properties of MEF models. With respect to coupling strategies, a systematic analysis is conducted on three essential coupling strategies, i.e., Decoupled, Decomposed, and Integrated, which may be associated with parallel, sequential, and simultaneous MEF computations, respectively. A set of fundamental underlying principles that characterise the coupling effectiveness in the MEF problem is introduced, namely, underlying formulations, problem size, interdependencies between networks, and calculation sequence, while highlighting their role in appraising different coupling strategies. In terms of solution techniques, besides classical Newtown-Raphson (NR), the performance of other potentially suitable algorithms, such as Quasi-NR, Levenberg-Marquardt, and Trust-Region, is extensively studied for solving MEF problems. A novel cross-over strategy is then proposed to utilise the synergistic benefits of studied algorithms for improving convergence properties without compromising computational efficiency. Building upon the insights gained from the MEF analysis, a fast decomposed algorithm is introduced for solving large-scale coupled electricity and gas networks with hydrogen injection modelling and gas composition tracking. Finally, this thesis provides novel insights and practical recommendations to identify the most suitable formulations, coupling strategies, and solution techniques for solving the steady-state MEF problem in a robust and computationally efficient way.
  • Item
    Thumbnail Image
    Communication Receivers with Low-Resolution Quantization: Fundamental Limits and Task-based Designs
    Bernardo, Neil Irwin ( 2023-08)
    The use of low-resolution analog-to-digital converters (ADCs) in communication receivers has gained significant interest in the research community since it addresses practical issues in 5G/6G deployment such as massive data processing, high power consumption, and high manufacturing cost. An ADC equipped in communication receivers is often designed such that its quantization thresholds are equally-spaced or the distortion between its input and output is minimized. These design approaches, however, may yield suboptimal performance as they neglect the underlying system task that the ADCs are intended to be used for. This presents an opportunity for us to explore receiver quantization designs that cater to specific communication tasks (e.g. symbol detection, channel estimation) and to understand how quantization impacts various aspects of receiver performance such as error rate, channel capacity, estimation error. In this thesis, we consider five independent research problems related to the communication receivers with low-resolution quantizers. Three of these research problems deal with capacity analysis of certain communication channels with quantized outputs. More precisely, we derive the capacity-achieving input distributions for four different channels with phase-quantized observations and the Gaussian channel with polar-quantized observations. For the channels with b-bit phase quantizer at the output, 2^b-phase shift keying modulation scheme can attain the channel capacity. Meanwhile, the capacity can be achieved in the Gaussian channel with polar quantization by an input distribution with amplitude phase shift keying structure. Capacity bounds for MIMO Gaussian channel with analog combiner and 1-bit sign quantizers are also established in this thesis. The remaining two research problems fall under the category of task-based quantizer design. The idea is to design the quantizer in accordance to the underlying system task rather than simply minimize its input-output distortion. Focusing on M-ary pulse amplitude modulation (PAM) receiver with symmetric scalar quantizer, the closed-form expression of the symbol error rate is derived as a function of quantizer structure and position of equiprobable PAM symbols. The derived expression is used to design the quantizer according to the symbol detection task. The high signal-to-noise ratio (SNR) behavior of the error rate of the quantized communication system is characterized. Our final work is a development of a new design and analysis framework for task-based quantizers with hybrid analog-to-digital architecture. In contrast to existing task-based quantization frameworks, the theoretical predictions of our proposed framework perfectly coincides with the simulated results. Moreover, the proposed frameworks can be used in data acquisition systems with non-uniform quantizers and observations with unbounded support.
  • Item
    Thumbnail Image
    Managing Future DER-Rich Distribution Networks with a Distributed Approach: Optimal Power Flow and ADMM
    Gonçalves Givisiez, Arthur ( 2023-06)
    The growing adoption of Distributed Energy Resources (DERs) is making distribution networks (i.e., both medium voltage [MV] and low voltage [LV] networks) to not only consume power but also to produce it, creating bidirectional power flows, which was something unexpected to happen when these networks were designed. This unexpected situation is creating some challenges for distribution companies to operate their networks, which includes voltage excursions (i.e., overvoltage or undervoltage) and congestion of transformers and/or conductors. To deal with these challenges, distribution companies have been using rule-based approaches to manage their controllable network assets (e.g., transformers with tap changers) and DERs (e.g., PV systems). However, rule-based approaches are very likely to become impracticable in the future, when the number of DERs is expected to be much higher, increasing the complexity of management. Besides, the higher amount of DERs is very likely to require a real-time operation of all controllable elements (i.e., DERs, OLTC-fitted transformers), which would inevitably press distribution companies to become much more active on managing these controllable elements. In this context, more advanced techniques will be required to handle the real-time operation of all controllable elements, which will have great number of variables (e.g., individual setpoints for controllable elements) and constraints (e.g., voltage and thermal limits) to be simultaneously considered. An advanced technique that has great potential to manage such complex problem is the AC OPF, but it is not scalable to be used for DER-rich, realistic large-scale integrated MV-LV distribution networks. In this PhD project, the following research is carried out to address the scalability issues of the conventional nonconvex AC OPF, particularly found in large-scale problems. Key findings and achievements are also highlighted. - An ADMM-based nonconvex three-phase AC OPF tailored for integrated MV-LV distribution networks is proposed. Its performance is tested in DER-rich, realistic large-scale integrated MV-LV networks with more than 20,000 single-phase equivalent nodes and more than 4,600 customers. The proposed ADMM-based nonconvex three-phase AC OPF shows to be accurate and faster than the conventional approach for large distribution networks. - A strategy to choose penalty parameters that allows fast convergence for the proposed ADMM-based algorithm was developed in this thesis. It is based on using different penalty parameters to each split variable, which facilitates the selection of penalty parameters that better adapts to each variable, and on using the engineering knowledge of distribution networks (i.e., number of houses, typical demand, PV sizes, maximum feeder capacity) to estimate adequate initial values for the penalty parameters, which then are fine tunned. The selected penalty parameters proved to quickly converge the proposed ADMM-based algorithm. - The implementation and performance assessment of the proposed ADMM-based nonconvex three-phase AC OPF was carried out for four engineering applications: calculation of setpoints for active power of PV systems, calculation of setpoints for active and reactive power of PV systems, calculation of setpoints for active power of PV systems as well as OLTC-fitted transformer tap positions, and calculation of setpoints for active and reactive power of PV systems as well as OLTC-fitted transformer tap positions. The proposed ADMM-based OPF has similar performance to the conventional OPF (i.e., nonconvex three-phase AC OPF) on calculating setpoints that ensure network integrity for all four applications. However, the proposed ADMM-based OPF is much faster than the conventional OPF. Therefore, the quality of the results and faster solution time across all investigated applications and time-varying conditions makes the proposed ADMM-based OPF a good alternative to solve large-scale, DER-rich three-phase AC OPF problems. - With the ADMM split, which separates the MV network problem from the LV network problems, voltage regulation devices (e.g., OLTC-fitted transformer) located at the MV network cannot sense voltage problems that occur at the end of LV feeders. This happens because the ADMM-based algorithm only shares the split point variables, which is located at the start of LV feeders, where there are no voltage problems. So, the MV network problem does not “know” about the voltage issues at the end of the LV feeder. In order to make these voltage regulation devices to sense voltage problems in another subproblem, hence enabling them to correct voltage issues, a novel adaptation on the ADMM-based algorithm was proposed. - An ADMM-based linearised three-phase AC OPF tailored for integrated MV-LV distribution networks is proposed. Its performance is tested in DER-rich, realistic large-scale integrated MV-LV networks with more than 20,000 single-phase equivalent nodes and more than 4,600 customers. This creates a formulation that is faster than the ADMM-based nonconvex three-phase AC OPF, which is ready for real-time (control cycles of 1 minute) operation of distribution networks. - A discussion on other potential applications of the proposed ADMM-based OPF formulations is carried out on the context of bottom-up services provision and TSO-DSO coordination.
  • Item
    Thumbnail Image
    Planning of Future-Proof Low Voltage Residential Networks
    Zeb, Muhammad Zulqarnain ( 2023-05)
    The increasing penetration of residential rooftop photovoltaics (PV) and home charging of electric vehicles (EVs) is presenting technical challenges for distribution companies responsible for managing the poles and wires. These challenges include problems such as voltage rise and asset congestion, which are caused by reverse power flows from PV systems. Additionally, there are issues of voltage drop and asset congestion that result from EV charging. Distribution networks are experiencing these problems because the existing low voltage (LV) networks were not designed for PV and EVs. To host PV and EVs, solutions such as thicker conductors or On-load Tap Changer (OLTC) must be added to the existing networks. Most of the existing LV networks were planned traditionally, by appropriately sizing three-phase conductors of distribution lines and suitable LV distribution transformer to ensure that customer voltages and power flows in network assets (transformer, lines) are within designed limits. To host PV and EVs in traditionally designed networks, many research works in the literature focused on adding smart voltage regulation devices such as OLTC to avoid using thicker conductors for three-phase line segments. Some also suggested the use of thicker conductors for distribution lines with a transformer using nominal voltage setting or an off-load tap changer. However, a detailed cost comparison of the mentioned design has not been done for the brand-new three-phase LV networks with 100% PV and EVs, i.e., when each house has a PV system and an electric car. Such a comparison can help identify the most cost-effective design for brand-new LV networks that can host 100% residential PV and EVs without requiring the addition of solutions later. This thesis fills the mentioned research gap by proposing an optimal power flow (OPF) based methodology to plan the brand-new three-phase LV residential networks for 100% PV and EVs. The developed methodology determines suitable conductor sizes and optimal tap changer position (depending on the design alternative) while the topology of the LV network follows the street layout. Additionally, it compares three design alternatives to determine the most cost-effective design. The compared design alternatives include appropriately sized conductors for three-phase line segments with either nominal voltage settings, off-load tap changer fitted transformer, or OLTC fitted transformers. Realistic considerations related to the tap changers, sizes of conductors available in the market, the impact of parallel unbalanced LV feeders on each other, and the impact of connected medium voltage (MV) with their LV parts are included in this planning. The proposed planning methodology is applied on a realistic Australian neighborhood with 89 single-phase residential customers. For the case study neighborhood, it is concluded that the most cost-effective design depends on the distance of the LV transformer from the zone substation (HV/MV transformer). Due to the impact of the connected MV network, voltage varies on the primary side of the LV transformer. The closer the distance, the lower are the voltage variations on the primary side of LV transformer, and therefore, the lower need for voltage regulation. For such LV networks, thicker conductors for lines and a transformer fitted with off-load tap changer provide the most economical design. On the other hand, for a group of customers located far away, the use of a transformer fitted with OLTC, and thinner conductors is the most economical design due to the need for better voltage regulation. The single tap setting of off-load tap changer needs a combination of thicker conductors for the lines, whose cost is not justifiable in such a scenario. This analysis helps us understand that no single design alternatives is economically feasible for all LV neighborhoods, and rather, the characteristics of the network are important to be considered. With the proposed three-phase methodology, and implementation on a neighborhood, this research work provides a detailed insight of the most cost-effective design for the future LV networks with higher penetrations of PV and EVs. It can guide distribution companies to make their three-phase LV networks future-proof, by selecting the most cost-effective design for neighborhoods with different characteristics.
  • Item
    Thumbnail Image
    Distributed Failure-Tolerant Anomaly Detection in Cognitive Radio Networks
    Katzef, Marc ( 2023-04)
    The communications landscape has seen exciting developments through the emergence of small, low-cost, wireless devices. Developments in these devices have led to unprecedented connectivity and distributed computational resources—ready to support new applications. Such applications provide new benefits to end users (through cognitive radio and Internet-of-Things, IoT, to name a few), as well as new attack vectors for malicious users—with a higher number of exposed devices and communications. In this work, we investigate the use of these new wireless networking devices to make wireless communication and networking more secure by analysing wireless activity throughout a network and training anomaly detection models to identify any unusual behaviour. Using their flexible communications, onboard computation, and ability to record wireless network data, we explore state-of-the-art methods to learn patterns in network behaviour using distributed sensing and computational resources. These methods span classical and modern anomaly detection approaches, each with its own benefits and drawbacks in terms of performance, resource usage, and reliability. Throughout this work, the tradeoff between these benefits and drawbacks is outlined and new collaborative anomaly detection methods are proposed. The methods and tools in this thesis have been analysed in various network environments, to strengthen present and future wireless networks.
  • Item
    Thumbnail Image
    Thermal Multispectral Imaging and Spectroscopy with Optical Metasurfaces and Deep Learning
    SHAIK, NOOR E KARISHMA ( 2022-12)
    Spectral imaging captures information in one or more selective bands across the electromagnetic spectrum, permitting the objects in the world to be identified by their absorption or reflection characteristics. Advancements in spectrally selective imaging have primarily been in colour imaging in the visible domain; however, infrared detectors have also enjoyed technological advances that position them ideally for thermal spectral imaging. Advanced spectrally selective imaging systems in longwave infrared (LWIR) thermal wavelengths of 8-14 microns can produce unique thermal fingerprints of objects by recording the heat radiation emitted from objects, thereby creating additional knowledge of the world otherwise difficult to acquire with colour cameras. Therefore, advanced spectral imaging finds important applications in precision agriculture (e.g., early detection of plant diseases), non-invasive medical diagnosis (e.g., vein and dental analysis, skin screening), mining (e.g., non-destructive testing), environmental monitoring (e.g., greenhouse gas detection) and recycling (e.g., plastic classification). However, existing LWIR multi- and hyperspectral imaging systems are expensive and bulky (with cryogenic cooling) and demand time and resources to process several images. Further, LWIR spectral imaging is hindered by the lack of materials responding to thermal wavelengths to design wavelength filters and the low resolution of thermal sensors to design a multi-band filter mosaic compared to their counterpart in the visible wavelengths. Recently, miniaturized infrared spectrometers were reported in the thermal wavelengths. However, they work only with a single isolated object using an active blackbody in the background and fail to detect multiple objects in real scenes. They collect an average emission from multiple objects using single or multiple detectors, which cannot be further resolved due to missing spatial information. There has been an ever-increasing demand for miniaturized and CMOS-compatible LWIR sensors performing imaging spectroscopy to realize their full potential with increased on-chip integration and new compact applications. In this thesis, I design and demonstrate lightweight and high-performance computational infrared imaging technology to enable joint spatial and spectral data acquisition in LWIR wavelengths. I propose and discuss promising solutions for handheld, mass-producible and affordable LWIR multi- and hyperspectral sensing systems using existing monochrome thermal sensors with a focus on plasmonic filters, sensor engineering and artificial intelligence. The first part of this thesis is focused on designing narrowband filter technology towards LWIR multi- and hyperspectral imagers. I begin by presenting optical metasurfaces and designing nano-optical filters with hexagonal lattices of hole/disk geometries to create surface plasmonic resonances in the LWIR regime. I perform comprehensive detector studies and detailed analyses of nano optical filters to accurately tailor the spectral responsivities of the LWIR plasmonic filters for imaging applications. I propose CMOS standard infrared plasmonic filters offering horizontal scalability, narrow spectral width, micron size thickness, and high transmission features. In the second part of this thesis, I explore time-resolved and spatially-resolved multispectral imaging systems for acquiring spatial image information in selective spectral bands. I substantiate the findings from the plasmonic filter simulations by experimentally realizing the novel LWIR plasmonic filters. Their instrumentation is explored by stacking into thermal image sensors through a filter wheel, and by integrating the filter mosaic into the camera to make a compact single-sensor imaging system. I experimentally demonstrate their time- or spatial-multiplexing performance in real-time and recover high-resolution multispectral images with deep imaging. In the third part of this thesis, I develop a deep learning-based LWIR imaging spectroscopy system prototype for acquiring more spectral information with selective spatial images in real time. This is a computational LWIR spectral imaging system acquired by the joint design of a snapshot multispectral imager at the hardware front, and a novel deep learning-based algorithmic spectroscopy concept for rapid spectral reconstruction at the software front. Snapshot images are acquired in selective spectral bands using LWIR plasmonic filters stacked to multiple detectors, which are further processed with deep neural network architecture to rapidly predict the spectra. The power of our deep learning-based imaging spectrometer is experimentally demonstrated by identifying four minerals: amethyst, calcite, pyrite, and quartz. The proposed technique is a simple and approximate 'uncooled LWIR thermal hyperspectral imaging system', which can be used to identify multiple objects by retrieving the spectral fingerprint in a real scene without recording a large number of images and without needing an active blackbody source. I thus demonstrate next-generation thermal sensing systems by merging nanoplasmonic sensors and artificial intelligence. Our results will form the basis for a snapshot, lightweight, compact, and low-cost hyperspectral LWIR imagers enabling diverse applications in chemical detection, precision agriculture, disease diagnosis, environmental sensing and industry vision.
  • Item
    Thumbnail Image
    Information-theoretic Analysis For Machine Learning and Transfer Learning: Bounds and Applications
    Wu, Xuetong ( 2023-03)
    Traditional machine learning is characterized by the assumptions that the training data and target data are drawn from the same distributions. However, in practice, obtaining these data may be expensive and difficult. Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions. The domain adaptation problems are widely investigated and used to improve the predictive results for one certain domain by transferring useful information from another (possibly) related domain where it is easy and cheap to obtain the data. Therefore, developing high-performance transfer learning techniques is necessary. One may ask how do we guarantee that the transfer learning is useful and efficient? In this thesis, we investigate the learning performance of the transfer learning algorithms from an information-theoretic perspective, where one broad line of work considers the learning setting where in the training phase we only have access to labelled data from the source distribution mu, possibly with some additional unlabelled or labelled data from the target distribution mu' that we are interested in the testing phase. A popular approach in this context is to formulate a measure of discrepancy between the distributions mu and mu' and to give test error bounds in terms of this discrepancy. In this sense, we are particularly interested in the generalization error, which is defined as the difference between the empirical training loss and the population loss under mu' for a given algorithm, and this quantity indicates if the output hypothesis of the algorithm has been overfitted (or underfitted). This quantity can be viewed as the distribution (over both data and algorithm) divergences between the training and testing phases. From this perspective, the information-theoretic approach will benefit from different perspectives. In this thesis, we first give a review of information-theoretic analysis for generalization error in traditional machine learning problems with identical training and testing data distributions. We then propose a fast generalization framework that enhances learning performance by identifying the key conditions and improving the learning rate, where the improvement shifts the typical information-theoretic bounds from sublinear convergence to linear convergence. Next, we extend this analysis to transfer learning under various learning settings, viewed from different perspectives. Initially, we use the variational representation of KL divergence to derive upper bounds for general transfer learning algorithms under the batch learning setting. These data-algorithm-dependent bounds offer valuable insights into the impact of domain divergence on generalization ability. We then extend the batch learning setting to the online learning setting, viewed from a Bayesian perspective, and consider transfer learning under the supervised learning setting. We view prediction from a causal perspective using the proposed potential outcome framework and derive corresponding excess risks under different distribution shifting scenarios. These bounds are useful in orienting general transfer learning problems and identifying whether transfer learning is practical. To demonstrate the practical applications of our theoretical results, we propose bound-based algorithms and show their versatility in real-world problems.
  • Item
    Thumbnail Image
    Printed Radio Transceivers
    Walla, Andrew Lewis ( 2022-10)
    The application of additive manufacturing technologies in the field of radiofrequency electronics is explored. Low-cost manufacturing technology is applied at the component level to fabricate P-N junctions and combined with other components to form a simple printed radio transceiver. An analysis is performed for large-scale antenna arrays - amenable to low-resolution additive manufacturing technologies - with a view to further improve the capabilities of printed radiofrequency systems. A low-resolution PCB printer with a minimum feature size in the order of 200 micrometres and thermal curing temperature up to 515 kelvin is demonstrated capable of fabricating P-N junctions suitable for application as a varactor at 100MHz to 2GHz. P-N junctions were formed by printing N type inks with inorganic ionic compounds (zinc oxide and diindium trioxide) and P-type inks with conjugated polymers (PEDOT:PSS and pentacene) as the active ingredients. For two P-N junction chemistries (the first combining PEDOT:PSS with diindium trioxide, the second combining PEDOT:PSS with zinc oxide), device impedance characteristics showed promise for application in a 50 ohm system operating at UHF frequencies. These inks were characterised with a source meter to demonstrate a current-to-voltage characteristic according to the Shockley equation and a network analyser to quantify the depletion region capacitance and equivalent series resistance as a function of frequency and bias voltage. A simple radio transceiver was demonstrated by printing a P-N junction as the terminating impedance to a printed antenna. Fabrication was completed in less than four hours and did not require temperatures exceeding 500 kelvin. As a transmitter, load modulation was applied to the P-N junction (a varactor) to phase modulate an incident radiofrequency signal before reradiating. Over-the-air measurements quantifying the power received from the transmitter as a function of distance, carrier frequency, signal bandwidth and BER for a 2FSK signal; demonstrated reasonable agreement with a Friis-equation-based path loss model. As a receiver, an AM signal received by the antenna experienced self-mixing due to the nonlinear current versus voltage characteristic of the P-N junction. The mathematics describing the harmonic content of the demodulated signal showed reasonable agreement to SPICE simulation data and empirical measurements. To overcome inverse quartic losses that commonly afflict such backscatter-based systems, antenna array techniques were demonstrated as a viable strategy. The reflected signal from an infinite size, infinitely dense antenna array was shown to converge to an optical mirror (i.e. inverse square losses) under certain circumstances. Further reductions in radiative power loss are obtained under the condition of (retrodirective) phase conjugate matching. Under ideal conditions, a planar array of isotropic antennas can return up to 25% of the energy from an isotropic source to a collocated receiver. Accounting for incomplete coverage area, improperly specified apertures, cable loss and antenna efficiency, the model was compared against FEM simulation data and empirical measurements for a linear array, showing reasonable agreement. Thus, low-resolution additive manufacturing was demonstrated as a viable technology for fabricating radiofrequency systems. Low-cost printers in conjunction with semiconducting and conducting inks may be used to fabricate simple radio transceivers, which may be combined in retrodirective arrays for improved performance characteristics.
  • Item
    Thumbnail Image
    Optimal Detection and Estimation for a Sinusoidal Signal with Randomly Varying Phase and Frequency
    Liu, Changrong ( 2023-03)
    This thesis focuses on detection of a sinusoidal signal with randomly varying frequency and phase. Such signals are encountered in a wide range of applications including radar, both active and passive sonar, sensor systems, underwater frequency line tracking, communication systems including frequency modulation techniques and optical communication. The specific motivation for the work presented in this thesis concerns the detection of continuous gravitational wave using the Laser Interferometer Gravitational-Wave Observatory (LIGO) sensor system. Continuous gravitational wave have not yet been discovered. Theory suggests that they are sinusoidal signals with randomly wandering frequency which varies slowly. Moreover, the signal to noise ratio for a continuous gravitational wave observed with the LIGO sensor is extremely small and detection is expected to require coherent integration over a period of one year or more. Hence, the need for the most sensitive optimal detection technique for sinusoidal signals with slowly randomly varying frequency is clear. In this thesis, we study this detection problem in great detail, covering techniques such as hidden Markov model based detector, optimal Bayesian detectors implemented using Markov chain Monte Carlo methods, optimal likelihood ratio detectors using the estimator-correlator structure and nonlinear optimal filtering, and finally, a least square based detector implemented using optimal control of a bilinear system. The thesis contains many new results and presents comparisons with more traditional detectors developed in the past. The thesis also reviews methods which have been developed over the past 70 years for estimating and tracking sinusoidal signals with varying frequency, including the well known phase locked loop, which is known to be closely related to the extended Kalman filter solution. While many papers have appeared on the problem of estimating the frequency of a sinusoidal signal, very few papers have addressed the problem of optimal detection of such signals. That said, optimal detectors are often based on optimal estimation, thus, much of the work in this thesis deals with the estimation problem.
  • Item
    Thumbnail Image
    Information Theory and Machine Learning: A Coding Approach
    Wan, Li ( 2022-11)
    This thesis investigates the principles of using information theory to analyze and design machine learning algorithms. Despite recent successes, deep (machine) learning algorithms are still heuristic, vulnerable, and black-box. For example, it is still not clear why and how deep learning works so well, and it is observed that neural networks are very vulnerable to adversarial attacks. On the other hand, information theory is a well-established scientific study with a strong foundation in mathematical tools and theorems. Both machine learning and information theory are data orientated, and their inextricable connections motivate this thesis. Focusing on data compression and representation, we first present a novel, lightweight supervised dictionary learning framework for text classification. Our two-stage algorithm emphasizes the conceptual meaning of dictionary elements in addition to classification performance. A novel metric, information plane area rank (IPAR), is defined to quantify the information-theoretic performance. The classification accuracy of our algorithm is promising following extensive experiments conducted on six benchmark text datasets, where its classification performance is compared to multiple other state-of-the-art algorithms. The resulting dictionary elements (atoms) with conceptual meanings are displayed to provide insights into the decision processes of the learning system. Our algorithm achieves competitive results on certain datasets and with up to ten times fewer parameters. Motivated by the similarity between communication systems and adversarial learning, we secondly investigate a coding-theoretic approach to increase adversarial robustness. Specifically, we develop two novel defense methods (eECOC and NNEC) based on error-correcting code. The first method uses efficient error-correcting output codes (ECOCs), which encode the labels in a structured way to increase adversarial robustness. The second method is an encoding structure that increases the adversarial robustness of neural networks by encoding the latent features. Codes based on Fibonacci lattices and variational autoencoders are used in the encoding process. Both methods are validated on three benchmark datasets, MNIST, FashionMNIST, and CIFAR-10. An ablation study is conducted to compare the effectiveness of different encoding components. Several distance metrics and t-SNE visualization are used to give further insights into how these coding-theoretic methods increase adversarial robustness. Our work indicates the effectiveness of using information theory to analyze and design machine learning algorithms. The strong foundation of information theory provides opportunities for future research in data compression and adversarial robustness areas.