Electrical and Electronic Engineering - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    A Bayesian approach to (online) transfer learning: Theory and algorithms
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J (Elsevier BV, 2023-11)
    Transfer learning is a machine learning paradigm where knowledge from one problem is utilized to solve a new but related problem. While conceivable that knowledge from one task could help solve a related task, if not executed properly, transfer learning algorithms can impair the learning performance instead of improving it – commonly known as negative transfer. In this paper, we use a parametric statistical model to study transfer learning from a Bayesian perspective. Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning. We define an appropriate objective function for each problem and provide either exact expressions or upper bounds on the learning performance using information-theoretic quantities, which allow simple and explicit characterizations when the sample size becomes large. Furthermore, examples show that the derived bounds are accurate even for small sample sizes. The obtained bounds give valuable insights into the effect of prior knowledge on transfer learning, at least with respect to our Bayesian formulation of the transfer learning problem. In particular, we formally characterize the conditions under which negative transfer occurs. Lastly, we devise several (online) transfer learning algorithms that are amenable to practical implementations, some of which do not require the parametric assumption. We demonstrate the effectiveness of our algorithms with real data sets, focusing primarily on when the source and target data have strong similarities.
  • Item
    Thumbnail Image
    Fast Rate Generalization Error Bounds: Variations on a Theme
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J (IEEE, 2022)
    A recent line of works, initiated by [1] and [2], has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(\sqrt λ I/n ) where λ is an assumption-dependent coefficient and I is some information-Theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate"of O(1 /n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound by evaluating λ under an appropriate assumption. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the ( η, c)-central condition. Under this condition, we give information-Theoretic bounds on the generalization error and excess risk, with a convergence rate of O (1 /n) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.
  • Item
    No Preview Available
    An Information-Theoretic Analysis for Transfer Learning: Error Bounds and Applications
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J ( 2022-07-12)
    Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms, following a line of work initiated by Russo and Xu. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence D(μ||μ′) plays an important role in the characterizations where μ and μ′ denote the distribution of the training data and the testing test, respectively. Specifically, we provide generalization error upper bounds for the empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the analysis to approximated ERM methods such as the Gibbs algorithm and the stochastic gradient descent method. We then generalize the mutual information bound with ϕ-divergence and Wasserstein distance. These generalizations lead to tighter bounds and can handle the case when μ is not absolutely continuous with respect to μ′. Furthermore, we apply a new set of techniques to obtain an alternative upper bound which gives a fast (and optimal) learning rate for some learning problems. Finally, inspired by the derived bounds, we propose the InfoBoost algorithm in which the importance weights for source and target data are adjusted adaptively in accordance to information measures. The empirical results show the effectiveness of the proposed algorithm.
  • Item
    Thumbnail Image
    Information-theoretic analysis for transfer learning
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J (IEEE, 2020)
    Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as μ and μ', respectively). In this work, we give an informationtheoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence D(μμ') plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms, and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.