With good understanding of data, deep generative models create remarkable data including vivid images and human-readable articles. But to fully utilise their understandings in tasks beyond synthesis, deep generative models need more delicate training and usage.

Having a good understanding of data greatly improves AI's efficiency, as AI can reuse its understanding to solve many tasks raised from the data, while avoiding solving each specific task by retraining a specific model. Recently, deep generative models show their promises of providing a generic protocol for AI to learn data understandings. However, popular implementations of deep generative models are still limited to train and utilize learnt understandings for very few tasks, despite that the learnt models are capable of doing much more.

At Boltzbit, we develop our own implementation of deep generative models with our unique model training and inference technology. Our implementation absorbs good designs in other implementations while supporting deep generative models to train and do inferences on all tasks raised from data, maximising generative model's power.

"Comprehension is compression. You compress things into computer programs, into concise algorithmic descriptions. The simpler the theory, the better you understand something."

- Gregory Chaitin, "The Limits of Reason".

Imagine what you will do when you are asked to draw a cat image: you first come up with an “idea” in your mind of what a cat looks like, then control your hand to follow your "idea" and finish the details. The “idea” in your mind is a high-level compressed representation of a cat image while your hand is like an unzip tool that decodes the compressed representation into an actual data instance. What's more interesting is, your "idea" of a cat image is not just used for drawing, you come up with the same "idea" to help you tell a cat from images of dogs, to find a cat hidden in a crowd, or to recognise a cat in a blurry old video.

We see for us humans, having good understandings of data greatly improves our efficiency to process the data-relevant information. This is similar for AI. Trainditional AI systems that build models for specific tasks learn limited understandings of data that are only applicable to specific tasks. The learnt data understandings can not be reused to help solve other tasks raised from the same data, instead each time a new model has to be trained from scratch. A more efficient AI system should be able to form a deeper "idea" of the data it learns and reuse the "idea" whenever possible, just like humans.

The question then remains how to let AI acquire such good understandings of data like humans do. Recent research in AI shows that deep generative models have a great potential of becoming a generic tool for building understandings of data. Mimicking human behaviours, a deep generative model is asked to explicitly form an efficient compact representation (the "idea") of the data, from which the model should be able to create synthetic data instances indistinguishable from the real ones. The higher quality the synthetic data is, the deeper understandings of data the model obtains. Powered by deep learning technology, deep generative models are now capable of building efficient understanding of complex real data including images, audios and texts, and can enhance the AI systems built on them.

Mimicking humans, a deep generative model creates a high-quality cat image by passing its understanding of "cat" (here a short length of codes) through a deep neural network.

Nevertheless the superior modelling power comes with prices. Compared to task specific models, training deep generative models is more difficult as we no longer have task-specific labels to supervise models on learning their data understandings. Applying deep generative models is also more challenging as the ways to infer data understandings in different tasks are rather different. In this section, let's discuss how current implementations of deep generative models achieve modeling training and leave the model usage to the next section.

A deep generative model represents its understanding, also called latent representation, of data via a short length of codes that appear to be random-sampled numbers, which are usually denoted by \(z\). Then the model uses a deep neural network \(G_\theta\) to convert the codes into the actual instance \(x = G_\theta(z)\), completing the generation process. Training involves adjusting the parameters \(\theta\) of the deep neural network so that high-quality data can be synthesised.

Currently, there exists two implementations of deep generative models, Variational Auto-encoders (VAEs) and Generative Adversarial Networks (GANs). They take very different approaches in training.

A VAE learns its understanding of the data by capturing each data instance one by one, trying to get ideas from the commons and differences of data instances. To achieve this goal, a VAE introduces and trains at the same time an auxiliary deep neural network, called *inference network*. The inference network can tell for each data instance, what other instances the generative model thinks should have similar representations to it. With similar data instances grouped together in the representation space of \(z\), a VAE is able to seperate the common from the instance-specific contents of the data, and use this information to arrange data in the representation space appropriately and to build a compact understanding for the whole data.

A VAE is trained to build the understanding for each data instance one by one.

A GAN learns its understanding of the data by capturing the whole data distribution directly. A GAN thinks a well-trained generative model should arrange its understandings of data to reflect all the details of the data distribution. To achieve this goal, a GAN also introduces and trains at the same time an auxiliary deep neural network, just like a VAE. The difference, however, is that the auxiliary network's role is to examine if the deep generative model can output a data distribution precisely the same as the true data distribution, from all possible perspectives. The auxiliary network is named *discriminator*, and when it can no longer tell the difference of the output distribution and the true data distribution, the training completes.

A GAN is trained to match the whole distribution of synthetic data to the true data distribution.

Deep generative models implemented by GANs and VAEs have achieved great success in synthesising high-quality complex data such as images, texts, and audios. However, data synthesis is just the most straightforward one among many possible tasks that generative models can efficiently solve with their well-established data understandings. For these tasks, GANs and VAEs provide only limited solutions or even leave the problems undefined, even though the deep generative models have the capability to solve them.

In general, these tasks can be grouped into three categories: data compression, prediction, and missing value completion.

**Data Compression** involves compressing complex data into shorter bits. Using a deep generative model, compressing a specific data point \(x\) can be achieved by finding the exact latent representation \(z\) that generates \(x\). Neither GANs or VAEs have fully support data compression tasks. GANs do not define a way to infer the latent representation from a data instance. VAEs can use the auxiliary inference networks to obtain the compressed bits, but the bits obtained are often not accurate enough to be good compressions.

**Prediction** involves predicting unknown values from known contents of data, while **Missing values completion** involves refilling the values we expect to see in the data but missing due to some reason. Prediction tasks are special cases of missing value completion, as the values to predict can also be thought of as missing values. To fill missing values of a data instance, we can use deep generative models to synthesise instances whose values are the same with the target instance on the observed positions, then use the synthetic values to fill in unseen positions. Good understandings of data are needed again to synthesise similar data, which deep generative models are able to provide. However, GANs cannot infer data understandings, while the inference networks in VAEs do not accept incomplete data instances as inputs.

We see deep generative models have potentially *unlimited* power if their data understandings are fully utilised in various tasks beyond data synthesis. On the other hand, VAEs and GANs fail to fully utilise these understandings to maximise the power of deep generative models.

At Boltzbit, we develop our unique implementation of deep generative models. Our implementation is based on Ergodic Inference (EI), a method that can efficiently infer the accurate latent representation of any given data point, either incomplete or complete, *without* introducing any extra auxiliary networks. Training a generative model under our framework is similar to VAEs, but with auxiliary networks replaced by the EI engine. Also, to obtain better models and achieve better training outcomes, we focus our research on the automated design of the model structure and the training loss that can adapt to different types of data.

Boltzbit unique implementation of deep generative models for training

When applying deep generative models to data understanding tasks, our approach uses the EI engine to extract understanings of data instances while respecting any missing values they may contain. In our implementation, data synthesis and other data-related tasks, can all be solved in the same pipeline, by using the same deep generative model, without any extra network design or training.

With Boltzbit EI engine, we can now solve different tasks using the same pipeline.

At Boltzbit, we keep developing our unique technology to more efficiently train generative models to build their data understandings, and to fully utilise model's unlimited understanding power in solving all kinds of data-related tasks, thus helping deep generative models build up a true comprehensive understanding of the data.