Smagulova, Kamilya2021-03-312021-03-312021-03http://nur.nu.edu.kz/handle/123456789/5354The growing amount of data, the dawn of Moore's law, and the need for machines with human intelligence dictated several new concepts in computing and chip design. Existing physical limitations of Complementary Metal-Oxide Semiconductor (CMOS) transistors and von-Neumann bottleneck problems showed that there is a need for the development of in-memory computing devices using technologies beyond-CMOS. The architecture of the long short-term memory (LSTM) neural network makes it an ideal candidate for modern computing systems. Recurrent connections and built-in memory of the LSTM network also allow us to process different types of data, including ones with temporal features and dependency. The realization of LSTM, and other artificial neural networks (ANNs), implies a large amount of parallel computations. Therefore, in most cases, their training and inferencing are implemented on modern computing systems with the help of a graphical processing unit (GPU). In addition, there are several available solutions for energy and area efficient inference of neural networks based on field-programmable gate arrays (FPGA) and application-specific integrated circuits (ASIC) platforms in both digital and analog domains. In 2008, the discovery of a new device called 'memristor’, which acts as an artificial synapse, brought attention to developments of memristive artificial neural networks ANNs. Due to their nanoscale size and non-volatile nature, memristor crossbar arrays (MCA) allow several orders of magnitude faster dot-product multiplication and require a smaller area and lower energy consumption. The recent successful works where memristors were used as a dot-product engine include “A convolutional neural network accelerator with in-situ analog arithmetic in crossbars” (ISAAC) and “A programmable ultra-efficient memristor-based accelerator for machine learning inference” (PUMA). Nevertheless, training of ANN on FPGA and ASIC remains a challenging problem. Therefore, the majority of memristive platforms are proposed only for the acceleration of neural networks with pre-trained parameters. In this thesis work, the design of an analog CMOS-memristor accelerator implementing long short-term memory (LSTM) recurrent neural network at the edge is proposed. The circuit design of a single LSTM unit consists of two main parts: 1) a dot-product engine based on memristor crossbar array using “one weight -two memristors” scheme; and 2) CMOS circuit blocks used to realize arithmetic and non-linear functions within LSTM unit. The proposed design was validated on machine learning problems such as prediction and classification. The performance of the analog LSTM circuit design was compared with other types of neural networks and neuromorphic systems, including a single perceptron, FNN, DNN, and modified HTM. Besides, analyses of memristor state variability in hybrid CNN-LSTM and CNN implementation for image classification have been performed successfully.enAttribution-NonCommercial-ShareAlike 3.0 United StatesComplementary Metal-Oxide SemiconductorCMOSlong short-term memoryLSTMartificial neural networksANNsgraphical processing unitGPUDESIGN OF CMOS-MEMRISTOR CIRCUIT OF LSTM ARCHITECTURPhD thesis