GPU Accelerated Microwave Kinetic Inductance Detector (MKID) Readout System Capstone Report Viktor Makhrinov Nazarbayev University Department of Electrical and Computer Engineering School of Engineering and Digital Sciences Copyright © Nazabayev University This project report was created on TexStudio editing platform using LATEX. All the figures were drawn using draw.io online software tool. Electrical and Computer Engineering Nazarbayev University http://www.nu.edu.kz Title: GPU Accelerated Microwave Kinetic In- ductance Detector (MKID) Readout Sys- tem Theme: MKIDs, GPU acceleration Project Period: Fall 2023 Project Group: . Participant(s): Viktor Makhrinov Supervisor(s): Mehdi Shafiee Copies: 1 Page Numbers: 24 Date of Completion: April 26, 2024 Abstract: Microwave Kinetic Inductance Detectors (MKIDs) have proven to be valuable in- struments for detecting weak signals in the microwave and millimeter-wave spectrum ranges in many scientific domains. Their novel operating principle, which is based on the detection of kinetic inductance changes in superconducting resonators, has led to appli- cations in astronomy, quantum computing, and materials science. The efficient reading of MKID arrays, on the other hand, creates computational challenges and frequently ne- cessitates the use of advanced data processing techniques. While effective, traditional readout systems based on Field Programmable Gate Arrays (FPGA) have limitations in terms of flexibility and development simplicity. This study looks into the possibility of Graphics Processing Unit (GPU) acceleration to over- come these difficulties. Using an example from current research, this research demonstrates how GPU acceleration can improve MKID readout systems, improve performance, and facilitate adjustments. In addition to speeding up development, the integration of GPUs opens up novel opportunities for MKID applications across scientific disciplines. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author(s). http://www.nu.edu.kz Contents Preface vi 1 Introduction 1 1.1 Working Principle and Applications of MKIDs . . . . . . . . . . . . . 1 1.2 Limitations of FPGA-Based Readout Systems . . . . . . . . . . . . . . 2 2 Background 4 2.1 Field-Division Multiplexing (FDM) in MKID Readout . . . . . . . . . 4 2.2 Advantages of GPU Acceleration for MKID Readout . . . . . . . . . 5 2.3 Example from the Field . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Methodology 7 3.1 Readout algorithm in the exemplary study . . . . . . . . . . . . . . . 7 3.2 Accelerating MKID Readout with GPU . . . . . . . . . . . . . . . . . 9 3.3 Replicate measurements from exemplary study . . . . . . . . . . . . 11 3.3.1 Experimental setup and system architecture . . . . . . . . . . 11 3.3.2 Performing measurements . . . . . . . . . . . . . . . . . . . . 12 4 Results 16 4.1 Readout of GHz frequency signals . . . . . . . . . . . . . . . . . . . . 16 4.2 Detection and measurements on 80 MHz signal . . . . . . . . . . . . 18 5 Conclusion 21 Bibliography 22 v Preface Nazarbayev University, April 26, 2024 Viktor Makhrinov vi Chapter 1 Introduction MKIDs, or microwave kinetic inductance detectors, are essential tools in modern scientific research. Because of their exceptional sensitivity in the microwave and millimeter-wave spectrum domains, these superconducting detectors are essential tools in many disciplines, including astrophysics, quantum computing, and mate- rials science. 1.1 Working Principle and Applications of MKIDs MKIDs use a simple but effective approach to monitor changes in the kinetic in- ductance of superconducting resonators caused by absorbed photons or particles. Photons or other particles hit with an MKID resonator, breaking Cooper pairs and increasing kinetic inductance. This shift is detectable and serves as the foundation for several applications [1]. MKIDs have been employed in astrophysical observatories, both on the ground and in space, to detect and analyze cosmic microwave background radiation, study galaxy formation, and examine dark matter [2][3][4]. Because of their high sensitiv- ity and multiplexing capabilities, they are suitable for such accurate astronomical observations. For instance, MKIDs will be beneficials for such ground systems like the Atacama Large Millimeter/submillimeter Array (ALMA) [5] MKIDs are appealing candidates for quantum bits (qubits) in quantum com- puting due to their quantum-compatible properties [6]. They promote quantum information processing by combining high-fidelity readout with low-noise opera- tion. MKIDs have been useful in the study of materials for determining material properties utilizing spectroscopic approaches [7][8]. Their ability to detect minor differences in the electromagnetic properties of materials has implications for ma- terial characterization and design. 1 2 Chapter 1. Introduction 1.2 Limitations of FPGA-Based Readout Systems When reading out an array of MKIDs, a field programmable gate array (FPGA) de- vice is often used to generate a probe signal, which is a frequency comb made up of sinusoidals. The comb is provided to the device, which is chilled in a dilution refrigerator and used to activate each resonator. When a resonator is triggered, it records its response to the relevant probe signal in the comb. On the receiving end, a filter-bank channelizer separates and analyzes each resonant frequency. This fil- ter bank, built on the same FPGA device, consists of polyphase FIR filters [9]. MKIDs are exceptionally powerful, but their precise readout creates computing challenges. FPGAs have been widely used in conventional readout systems, but despite their exceptional capabilities, FPGAs have several intrinsic limitations: Creating Complex Firmware: The development and modification of FPGA firmware requires a high level of competence as well as a large amount of time [10][11]. Limited Flexibility: FPGA-based systems may be unable to quickly adapt to changing experimental demands. High Power Consumption: Warm multiplexed readout circuits can consume a lot of power, limiting their scalability for larger MKID arrays. Cost of Expertise: Maintaining FPGA systems usually needs specialized knowl- edge, raising operating costs [10]. Crosstalk: High pixel density causes a crosstalk due to electromagnetic coupling between the resonators [2], and its solution also needs to be adapted for different MKIDs families, which are lumped element KIDs [12] and lens-antenna coupled distributed MKIDs [13]. Due to these limitations, there is a rising demand for investigating alternate readout techniques that combine computational effectiveness with development simplicity. Recent study employed the MGTree technique which is known for its remark- able performance in virtual source tree tracing for 3D ray tracing applications. It accomplishes this by innovatively combining CPU, GPU, and CUDA technologies in a single system, just the one used in an exemplary study for this paper.. Their approach, which uses MPI and CUDA, effectively utilizes the parallel processing powers of both CPU and GPU, making it excellent for multi-operand calculations, global performance optimization, and job distribution. Notably, CUDA’s connec- tion to the GPU helps to reduce the burden on individual cores, ensuring scala- bility and optimal resource utilization. MGTree proves its worth in real-world ap- 1.2. Limitations of FPGA-Based Readout Systems 3 plications by outperforming traditional CPU and GPU methods, such as wireless channel simulations in an indoor conference room. This emphasizes the benefits of integrating CPU, GPU, and CUDA into a unified computing framework[14]. This paper aims to show that using GPUs gives a compelling potential to rev- olutionize MKID reading in light of the operating principles and applications of MKIDs, their limitations in conventional FPGA-based readout systems, and the benefits of GPU acceleration. It will be discussed, how MKID readout systems can solve present challenges and open up new opportunities by utilizing the par- allel computing power, high memory bandwidth, scalability, accessibility, and ease of programming inherent in GPUs. The use of GPU acceleration in MKID read- out will be discussed in greater detail, along with how it affects performance and adaptability as well as the potential for revolutionizing a number of scientific fields. We want to show how GPU-accelerated MKID readout systems may enhance read- out responsiveness and efficiency, expanding the possibilities of MKID detectors across several scientific study disciplines. Chapter 2 Background 2.1 Field-Division Multiplexing (FDM) in MKID Readout It is important to fully understand the core concepts of MKID readout, in particular the function of Field-Division Multiplexing (FDM), before getting into the benefits of GPU acceleration. In MKID readout systems, the FDM technique is used to separate the signals coming from different resonators in an array. In an MKID array, each resonator reacts to a certain frequency. Multiple resonators can be read out simultaneously by using FDM, which divides the resonators into discrete frequency bins [15][16]. The MKID readout algorithm typically goes through three stages: Generating Probe Signals at Low Frequencies: To read an array of N microwave resonators, N different probe signals are initially produced, each at a different frequency. In order to increase the frequency of these signals to the required level, which coincides with the frequency of the resonators, I/Q modulation is used initially to generate them at a low frequency [16][17]. I/Q Modulation: Up-converting a signal to a higher frequency involves using the I/Q modulation technique. A local oscillator (LO) signal at the desired frequency is added to the signal to obtain the desired outcome. The mixer produces a signal at the sum and difference of the two input frequencies [16][17]. Excitation of the MKIDs: After generation, the probe signals are given to the MKIDs in order to excite them. In response, the MKIDs absorb photons and mod- ify their resonance frequency. By keeping an eye on the phase and amplitude of the reflected probe signals, it is possible to track changes in resonant frequency [16][17]. Due to the need for effective readout systems given the process’ complexity, MKID technology is a prime candidate for GPU acceleration 4 2.2. Advantages of GPU Acceleration for MKID Readout 5 2.2 Advantages of GPU Acceleration for MKID Readout Parallel Processing Power The immense parallel processing capacity of GPUs is one of their most notable characteristics. GPUs are made for performing several par- allel activities at once, in contrast to Central Processing Units (CPUs)[18], which are intended for sequential processing. This architecture perfectly satisfies the de- mands of MKID readout, where handling massive datasets over numerous chan- nels is crucial [19]. MKID detectors frequently consist of arrays with hundreds or even thousands of different resonators, each of which generates data that needs to be analyzed simultaneously. Such data-intensive operations are well suited for GPUs, which makes them a suitable solution for MKID readout applications [20][21]. High Memory Bandwidth The larger memory bandwidth of GPUs is another sig- nificant benefit. As a result, data transfer bottlenecks are reduced due to the GPUs’ ability to efficiently manage and transmit data between processing cores and mem- ory. High memory bandwidth ensures that the GPU has the ability to modify the data with the bare minimum of delays in MKID readout, where quick data cap- ture and processing are essential. This results in quicker reading times and better system performance overall [20][22]. Scalability The scalability provided by GPUs fits the many requirements of MKID studies. GPUs can adjust to the computing demands, whether working with small-scale systems or huge detector arrays. This scalability is especially ad- vantageous because it enables researchers to customize their readout systems to the precise needs of their investigations. Additionally, it guarantees that GPU- accelerated MKID reading systems can expand along with changing initiatives for research [20]. Accessibility and Ease of Programming GPU-based systems are more widely avail- able and easier to program than FPGA-based ones. Since FPGA creation often involves specific education and expertise, the number of professionals qualified to update or maintain the firmware is constrained. GPUs, however, can be pro- grammed using well-known languages and libraries. For instance, the CUDA plat- form from Nvidia provides a C++-compatible framework for GPU programming [19]. By making MKID technology more accessible, a larger group of scientists and researchers are able to take advantage of its benefits [10][22]. 2.3 Example from the Field Examining a case from recent research, as described in source [10], will help to demonstrate the vital potential of GPU acceleration in MKID readout. The ROACH (Reconfigurable Open Architecture Computing Hardware) boards from the Casper 6 Chapter 2. Background collaboration, which typically rely on FPGA firmware for reading, were enhanced with GPU acceleration in this study. The original purpose of the ROACH boards, which use Xilinx FPGAs, was to produce buffers for digital-to-analog converter chips (DACs) and to process raw data obtained by analog-to-digital converter chips (ADCs) in real-time. The substantial knowledge required for firmware develop- ment and modification in this FPGA-centric strategy resulted in lengthy develop- ment timeframes. GPU acceleration was implemented in the modified system to shift computa- tionally heavy activities from the FPGAs to the desktop computer’s GPUs. The development time for the required software, which was built in Python and C++, was substantially cut down to only a few days by utilizing Nvidia’s CUDA frame- work. Researchers were able to quickly modify the readout system to satisfy the demands of various detector programs because of this design approach. The addi- tion of GPUs sped up development while also improving the system’s adaptability and flexibility. As the needs of the experiments changed, researchers were able to quickly modify the readout algorithms. This effective use of GPU acceleration shows how capable and quick-acting MKID reading systems could potentially be due to this technology. Chapter 3 Methodology 3.1 Readout algorithm in the exemplary study Exemplary study used in this work is a paper “A Flexible GPU-Accelerated Radio- frequency Readout for Superconducting Detectors” by Lorenzo Minutolo et. al. from 2019 [10]. To quickly summarize, in this research the Casper team included GPU acceleration into ROACH boards. Historically, these devices relied on FPGA firmware for data reading. The first tasks of Xilinx FPGA-equipped ROACH boards were real-time processing of raw data from ADCs and buffer construction for DACs. Committees were initially responsible for implementing these stan- dards. Due to the skill set needed to build and alter firmware using this FPGA- centric strategy, development took longer. The revised system moved computa- tionally heavy activities from FPGAs to desktop GPUs using GPU acceleration technology. Implementing Nvidia’s CUDA library cut Python and C++ applica- tions’ development time to days. Due to rapid development, scholars may quickly adjust the readout system to meet detector program parameters. GPUs acceler- ated development and increased the system’s adaptability and versatility. The re- searchers showed they could quickly modify readout algorithms to change exper- imental needs. GPU acceleration has improved MKID reading systems’ efficiency and responsiveness. All of the codes for the GPU server and the readout used in this paper ca be accessed via GitHub page titled ”GPU-SDR” and these will be used In order to investigate their methods and steps they take for the readout algorithm. As an example for the measurement and analysis I take the “swipe-parameters.py” script py which is according to the author "the most similar to our acquisition routine". This scripts employs a comprehensive methodology to examine the functionalities of the Vector Network Analyzer (VNA) in meticulous concentration, with respect to the readout of Microwave Kinetic Inductance Detectors (MKIDs). The modular architecture of the system enables a thorough examination of the critical phases 7 8 Chapter 3. Methodology that are paramount in the overall assessment of the experimental configuration. At each stage of the algorithm, its importance and contribution to the overarch- ing understanding of the system are emphasized. Methodically, the algorithm is developed. To initiate the process, a multiplication of a seed VNA measurement is per- formed to guarantee the preservation of phase coherence. A rise in processing efficiency can be observed as a consequence of employing GPU acceleration dur- ing server initialization. Post-processing line delay measurement and analysis is crucial in order to effectively address the inherent delays that are present within the system. For subsequent measurements to be conducted with greater precision and accuracy, the system’s complexities must be thoroughly examined. The seed VNA measurement is subsequently performed by the script utilizing the Single- VNA function, which is tasked with acquiring comprehensive frequency sweep data. Crucial resonance characteristics are acquired during this phase, which is crucial for gaining a comprehensive comprehension of the behavior of MKID res- onators. The subsequent analysis encompasses the procedures of initializing, fit- ting, and exhibiting the resonator. By employing this approach, crucial parameters that are required for subsequent iterations of the analysis can be extracted. The program executes VNA measurements through a smooth transition to a loop that traverses a range of gain configurations. Preliminary resonator initialization is ac- complished by employing the most recent VNA scan or the seed VNA. This is an essential stage in order to maintain the coherence of the experimental configura- tion. Methodically fitting resonators is performed so as to acquire information that is beneficial with regard to the way in which the instruments react to different lev- els of gain. In order to ascertain the dynamic behavior of MKID resonators under various operational conditions, this step is necessarily undertaken. Acquisition of ambient data and meticulous collection of tones during each iteration constitute a critical component of the script. This phase offers a thorough comprehension of the noise properties inherent in the system, which is significant for applications in condensed matter physics and quantum technologies. Capitalizing on this element is of utmost importance. An enhanced flexibility of the script is achieved through the transition of the resonator group from the VNA to the noise files, enabling a comparative analysis of the resonator’s response in the presence and absence of external effects. The diagnostic VNA noise graphs, which are generated at the conclusion of each workflow iteration, offer a graphical depiction of the noise char- acteristics. During this stage, not only is the prompt evaluation of the experiment’s results facilitated, but the experimental setup’s characteristics are also ascertained and improved. The script streamlines critical processes within the framework of MKID analysis by leveraging GPU acceleration at critical phases via the Vector Network Analyzer (VNA). This results in improved computational efficiency and accelerated operations. GPU acceleration, which is enabled on the server through 3.2. Accelerating MKID Readout with GPU 9 parallel processing, provides significant advantages when complex computations and procedures involving large amounts of data must be executed more rapidly. Procuring a seed VNA measurement, which serves as the preliminary phase of the script, is an operation that requires a significant quantity of CPU resources. A GPU acceleration mechanism may be implemented during the server’s launch in order to maximize the utilization of the GPUs’ parallel processing capabilities. As a result, a substantial reduction in time is anticipated for the coherent replication of phase-sensitive measurements. The accelerated execution will ensure that the ensuing operations, which rely on precise and prompt initialization, commence with a strong foundation. During the second stage of measuring and evaluating line delay, which is essential for compensating for delays introduced by the sys- tem, the graphics processing unit’s (GPU) acceleration is maximally significant. GPUs employ parallel processing capabilities in order to expedite delay computa- tion process. This allows for expeditious modifications and enhances the overall accuracy of the system. While the Single-VNA function is being executed for seed VNA measurement, GPU acceleration is implemented to accelerate the process of acquiring frequency sweep data. It is critical to possess this acceleration when han- dling a substantial quantity of data points and iterations. The effective acquisition of detailed resonator parameters is facilitated by the parallel processing capabili- ties of GPUs, thereby enhancing the overall pace of the workflow. 3.2 Accelerating MKID Readout with GPU The script streamlines critical processes within the framework of MKID analysis by leveraging GPU acceleration at critical phases via the Vector Network Analyzer (VNA). This results in improved computational efficiency and accelerated opera- tions. GPU acceleration, which is enabled on the server through parallel process- ing, provides significant advantages when complex computations and procedures involving large amounts of data must be executed more rapidly. Procuring a seed VNA measurement, which serves as the preliminary phase of the script, is an operation that requires a significant quantity of CPU resources. A GPU acceleration mechanism may be implemented during the server’s launch in order to maximize the utilization of the GPUs’ parallel processing capabilities. As a result, a substantial reduction in time is anticipated for the coherent replication of phase-sensitive measurements. The accelerated execution will ensure that the ensuing operations, which rely on precise and prompt initialization, commence with a strong foundation. During the second stage of measuring and evaluating line delay, which is essen- tial for compensating for delays introduced by the system, the graphics processing unit’s (GPU) acceleration is maximally significant. GPUs employ parallel process- 10 Chapter 3. Methodology ing capabilities in order to expedite delay computation process. This allows for expeditious modifications and enhances the overall accuracy of the system. While the Single-VNA function is being executed for seed VNA measurement, GPU acceleration is implemented to accelerate the process of acquiring frequency sweep data. It is critical to possess this acceleration when handling a substan- tial quantity of data points and iterations. The effective acquisition of detailed resonator parameters is facilitated by the parallel processing capabilities of GPUs, thereby enhancing the overall pace of the workflow. A procedure frequently employed in signal processing, data reduction, and large-scale physics simulations [23] is the FFT. In certain scientific and profes- sional fields, the Fast Fourier Transform (FFT) is a crucial operation. In addition, a substantial quantity of resources and memory bandwidth are required to support the computational demands of this method, specifically for extended Fast Fourier Transforms (FFTs). The potential performance enhancement of applications demanding substantial computational capacity has been demonstrated to be tenfold when GPUs are uti- lized in contrast to conventional multi-core CPUs[23]. This situation offers a poten- tially fruitful prospect to tackle these specific obstacles. Reducing the challenge of fitting FFT problems within the GPU’s memory and enhancing the data transmis- sions between the CPU and the GPU constitute the majority of the current efforts to implement GPUs in FFT applications. Implementing strategies such as decomposing Fast Fourier Transform (FFT) prob- lems and optimizing the architecture of processing elements in GPUs [23] has been crucial in making use of the hierarchical memory structure inherent in GPUs. By utilizing the Fast Fourier Transform (FFT) method, it is feasible to divisible a Dis- crete Fourier Transform (DFT) into more manageable components. Numerous im- plementations, including the Cooley-Tukey Fast Fourier Transform, decrease the computational expense of the algorithm from O(N2) to O(N log N). In the realm of radio astronomy data correlation, the ability to identify distinct signals in the frequency domain is a critical component of the scientific discipline. In this regard, FFT is an indispensable instrument. The Fastest Fourier Transform in the West (FFTW) and the CUDA Fast Fourier Transform (CUFFT) serve as ex- amples of how GPU acceleration is implemented to enhance the efficacy of Fourier transform calculations. GPUs have become essential components in parallel computing applications, with cosmology being one such specific domain [24]. GPUs are widely recognized for their robust computational capabilities and intrinsic parallelism and they are dis- tinguished by a multiplicity of data parallel threads, operate under a programming paradigm that is diametrically opposed to that of conventional CPUs. Originating as parallel accelerators for scientific computation, GPUs have undergone a sig- 3.3. Replicate measurements from exemplary study 11 nificant transformation since their initial purpose of powering visual applications [25]. They now outperform CPUs in terms of processing capabilities. Consid- erable scholarly inquiry has been devoted to GPU parallelization techniques for FFT-related applications in the field of radio astronomy correlation. In pursuit of optimizing the functionality of the GPU, a variety of strategies are implemented. These approaches consist of single-threaded CPU methods and highly parallelized pair parallel strategies. These solutions, which consider GPU memory operations, data staging, and the intricate nature of thread balancing [25], serve as illustrations of the intricacy that is intrinsic to the optimization process. 3.3 Replicate measurements from exemplary study 3.3.1 Experimental setup and system architecture Our project’s major goal is to recreate and implement a system that Minutolo[10] first presented in the paper mentioned above. To acquire information regarding incoming radiation, the system uses Microwave Kinetic Inductance Detectors in conjunction with a Software Defined Radio configuration. The USRP X310 acts as the system’s primary component, and it is equipped with a UBX160 daughter- board. It is designed to provide a bandwidth of 100 Msps and the ability to manage frequencies up to 6 GHz. Figure 3.1: USRPx310 FPGA[26] Figure 3.2: UBX160 daughterboard for USRP[27] The provided block diagram explains in detail the signal processing architec- 12 Chapter 3. Methodology ture with which the USRP X310 was designed in terms of hardware. This is ac- complished through the use of an intricate layout in which the transmitter path enhances the outgoing signals, which are then transferred via a TX/RX antenna. In contrast, the receiver path collects incoming signals using a separate RX2 antenna. Following that, these signals are amplified and filtered using low-pass and band- specific filters to accommodate different frequency ranges. The complex conversion paths are particularly significant; the down-conversion path uses a reference signal to ensure precision, effectively integrating high-frequency signals into a lower in- termediate frequency or directly into baseband I/Q components. Before the signal is subjected to the final I/Q modulation for transmission, the up-conversion cir- cuit includes a 2.44 GHz band-pass filter. This filter helps to improve the signal’s clarity. This strong architecture, which combines high-speed data converters and FPGA-based signal processing, enables the transfer of high-rate data, which is re- quired for the exact operation of MKID detectors. Another advantage of our system is its high computing power. The SDR system Figure 3.3: Block diagram of UBX160 daughterboard[27] uses a 10-gigabit network card with an SPF+ cable connector to transport down- sampled data to a GPU server. The server is equipped with a Tesla K40m graphics processing unit, 140 gigabytes of random access memory (RAM), and a 64-core pro- cessor, all of which give enough of computing capacity to handle the high amount of data flow produced by MKIDs. 3.3.2 Performing measurements A Python-based program deployed on the GPU server performs several analytical operations on the input signals. This algorithm’s processing includes techniques 3.3. Replicate measurements from exemplary study 13 such as noise acquisition, delay measurements, resonator calculations, and analy- sis with a Vector Network Analyzer. When the readout begins, the GPU server is started, which is given by the GPU SDR repository. This server is launched using the GCC compiler, which connects to the Boost and UHD libraries, which are then used to configure the USRP and make measurements on the FPGA. In addition, a connection is formed between the server and the graphics processing unit (GPU), in this case the Tesla k40m. The GPU is used to handle a large amount of data that is transmitted via USRP in the form of MKIDS. While it is feasible to start the GPU server from the computer and send measurement commands remotely, our configuration requires that all operations be performed within a single setup. This is done to improve dependability and reduce data transmission delay by reducing the number of connected devices. Finally, the server connects to the USRP and the whole system is ready to perform measurements. A second Python session is used to assist the sending of orders for signal analysis. Figure 3.4: Starting GPU server This session use the pyUSRP library, which includes all of the necessary functions for VNA (Vector Network Analyzer), noise acquisition, delay, and resonator mea- surements. The measurements are carried out using Python methods, whereas the server itself runs C++ code. To assure the use of Python version 2.7, which is now old and unsupported, our configuration required the use of a Python virtual en- vironment. The process of finding the requisite versions for Python modules, the GCC compiler, the Boost library, and the UHD library proved to be a significant challenge throughout the project. This occurred because the method used was de- veloped in 2019, and later versions have become obsolete and no longer available. Attempts to use automatic setup were unsuccessful because specific modules were missing from some versions. The setup was therefore manually configured. This necessitated testing and confirming the compatibility of multiple versions in order 14 Chapter 3. Methodology to install the algorithm for Ubuntu’s current setup and requirements, as well as the required libraries. This goal was accomplished by using a virtual environment. This was done to ensure that only manually installed versions are used and to avoid automatic system updates. Python session is started using virtual environment and the fist step is to import the puUSRP library which allows to connect to the server and perform meaure- ments as shown below. Figure 3.5: Connecting to the server and performing VNA measurements suing python session from virtual environment Once the environment has been created, a Python script is used to control and interact with Universal Software Radio Peripheral (USRP) device. Figure 3.6: Self written Python script used to perfrom measurements of the RF signal 3.3. Replicate measurements from exemplary study 15 The script’s initial phrase establishes a relationship with the USRP, which pre- pares for a series of measures. The script-initiated function runs a systematic scan of a specified frequency range and stores the resulting delay data in order to mea- sure signal delay along a specific pathway. Following that, the collected data is analyzed to better understand the features of signal latency. The evaluated re- sults are carefully saved in files for easy retrieval and long-term preservation. The Vector Network Analyzer (VNA) examines the system under consideration by an- alyzing the influence of components such as amplifiers and filters on signals and providing useful information about their features. These measurements cover a wide range of frequencies and configurations to provide a full assessment of the system’s efficacy. Modern plotting libraries are used to visually portray gathered data while also reviewing it for rapid comprehension in order to provide intricate details and increase clarity. Measurements of noise are as crucial as VNA mea- surements. The script isolates specific frequencies from VNA data and quantifies the related noise quantities while taking precision-ensuring variables into account, such as decimation rate and averaging. Further insights into the system’s attributes can be obtained by comparing the noise data to the resonator data and identify- ing any correlations or patterns. Diagnostic procedures are performed to verify measurement precision and reliability. Chapter 4 Results 4.1 Readout of GHz frequency signals Figure 4.1: VNA analysis of 1GHz sinusoid signal using front end B The presented plot is the result of a Vector Network Analyzer test with a USRP, used to readout the signal with a frequency of 1GHz. Typical frequencies for MKIDs generated signal are in the range 1-4 GHz, which emphaisize that the following test is predictor if the readout is ready to be employed in MKIDs ex- perimental setup. The plot depicts the frequency spectrum using two graphs: the upper graph shows the amplitude in decibels (dB), while the lower graph shows the phase in radians (rad). A notable surge can be seen on the magnitude map at around 1.025 GHz. The peak represents the sinusoidal signal under inquiry, and its small breadth indicates a significant quality factor, also known as the Q factor. This suggests that the signal 16 4.1. Readout of GHz frequency signals 17 has negligible energy dissipation in comparison to its frequency, which is typical of a resonant system with low damping. The size of the signal far exceeds the expected noise level of -100 dB, indicating a clear and reliable reading. The phase diagram shows a continuous reduction across the whole spectrum, which is indicative of a swept-frequency response. Resonance is defined as an abrupt discontinuity (phase shift) that occurs at the frequency where the magni- tude peaks. It is assumed that a phase shift will occur as the signal approaches the frequency at which resonance occurs. These results indicate that the method used to interpret the 1 GHz sinusoidal signal is precise. The existence of a prominent peak in the magnitude plot, along with a phase shift at the expected frequency, confirms that the signal is detected with outstanding accuracy. Furthermore, the peak’s strong signal-to-noise ratio suggests excellent RF signal readout quality. However, it is crucial to identify the occasional appearance of anomalous peaks in the magnitude plot, since they could represent tiny reflections or interference inside the system. These oddities, however, appear to have no bearing on the fundamental signal of concern. In addition, this readout was tested both front ends of the USRP and it was found that front end B performs better in terms of noise. Figure 4.2: VNA analysis of 1GHz sinusoid signal using front end A In conclusion, the VNA plot shows that a 1 GHz sinusoidal signal was read precisely and with good quality. The observed phase shift and strong peak in the figure indicate that the USRP-based RF signal measurement gear is functioning well. 18 Chapter 4. Results 4.2 Detection and measurements on 80 MHz signal The VNA plot gives detailed information on the transmission and reflection char- acteristics of the radio frequency (RF) system over a wide frequency range. The plot shows a largely level baseline, with a notable peak at around 80 MHz. This peak’s frequency is most likely the resonant frequency of one of the components in the MKID arrangement. It could be the resonator or a feature of the feedline component. To accurately interpret MKIDs’ reactions to incoming photons, a full understanding of resonance frequencies is required. This is because these frequen- cies will be utilized to identify and analyze the MKID responses. A phase response plot is displayed beneath the VNA magnitude plot. This graph shows how the phase of the signal relates to its frequency. In MKID applications, phase response provides useful information on the dispersion properties of the transmission line or resonators employed. The presence of a phase shift at the resonant frequency allows for the identification between several resonant modes. This is something that may be expected. Figure 4.3: VNA analysis of 80GHz sinusoid signal The resonator plot is a sophisticated I-Q plot that is critical to the operation of MKID. The graphic includes both in-phase and quadrature components. This program plots the resonator responses on the complex plane. Individual loops can sometimes represent the behavior of a single resonator. The interaction of photons with an MKID alters the inductance of the resonator, causing the loop to shift in the I-Q plane. To have a better knowledge of the quality factor (Q) and coupling efficacy of the resonator, the dimensions and positioning of these loops are taken into account. The graph on the right can be used to determine frequency shift (∆f) versus phase. This information is useful for modifying the system and determining the MKIDs’ operational bandwidth. 4.2. Detection and measurements on 80 MHz signal 19 Figure 4.4: Resonators plot The diagnostic map displays the system’s noise characteristics overlaid on the averaged noise acquisition and VNA traces. High sensitivity and accuracy in pho- ton detection with MKIDs require low noise levels. Using this diagram, researchers can determine whether the electronic noise level is low enough to not interfere with the signals emitted by MKIDs. An additional diagnostic plot provides for a com- plete evaluation of the noise and the system’s response, which may indicate the level of stability the system maintains over time or under varied conditions. Sta- bility is critical for MKIDs since deviations can impair photon detection accuracy. Figure 4.5: Noise aquisition The script output for line delay analysis is used to account to any possible time delays in the signal path. Given its ability to influence the temporal preci- 20 Chapter 4. Results Figure 4.6: Plot of VNA data indication average noise sion of photon detection, it is critical to thoroughly explore and reduce this delay in MKID models. By combining these plots and analyses, the testing approach Figure 4.7: Line delay measurements demonstrated that the system exhibits the desired resonance characteristics, the noise environment is controllable, and the components, such as transmission lines, are well understood in terms of delay qualities. This demonstrates that the system was properly created and is operating within the expected parameters for MKID readout. Given this, the system is suitable for use in an authentic experimental configuration to detect photons with MKIDs, and there is a reasonable expectation that it will be reliable and accurate. Chapter 5 Conclusion This study has laid a solid framework for the potential integration of GPU-accelerated readout systems into microwave kinetic inductance detectors. We have begun to address the inherent limitations of FPGA-based systems, such as their inflexibility and high power consumption, by investigating enhanced GPU capabilities. Our research proposes a fresh and effective alternative for MKID applications, which, while not yet confirmed in real experimental settings, shows potential theoretical benefits. The initial results of adding GPU acceleration in MKID readout systems demonstrate a possible ability to handle large arrays of detectors and perform so- phisticated computations with increased speed and efficiency. This result suggests a fundamental shift in how MKID technology may be used in the future, poten- tially increasing the detection and analysis of electromagnetic signals in a variety of scientific domains. The subsequent investigation will involve physically incor- porating this reading method, which uses GPU acceleration, into practical MKID settings in order to objectively assess its effectiveness. This will allow for a di- rect comparison with conventional FPGA-based systems, potentially verifying the theoretical advantages of GPU acceleration. Additional research and innovation may lead to the development of more advanced and responsive detectors, im- proving our understanding of astrophysics, quantum computing, and materials science. To summarize, this study shows that GPU-accelerated MKID readout sys- tems are both feasible and theoretically advantageous. Future experimental tests will demonstrate the actual impact and performance gains. GPU technologies are continually advancing, improving the performance and capacity of MKID systems. This propels them to the forefront of scientific research and innovation. 21 Bibliography [1] Joris van Rantwijk et al. “Multiplexed Readout for 1000-Pixel Arrays of Mi- crowave Kinetic Inductance Detectors”. In: IEEE Transactions on Microwave Theory and Techniques 64.6 (2016), pp. 1876–1883. doi: 10.1109/TMTT.2016. 2544303. [2] Omid Noroozian et al. “Crosstalk Reduction for Superconducting Microwave Resonator Arrays”. In: IEEE Transactions on Microwave Theory and Techniques 60.5 (2012), pp. 1235–1243. doi: 10.1109/TMTT.2012.2187538. [3] J. Zmuidzinas and P.L. Richards. “Superconducting detectors and mixers for millimeter and submillimeter astrophysics”. In: Proceedings of the IEEE 92.10 (2004), pp. 1597–1616. doi: 10.1109/JPROC.2004.833670. [4] Lorenza Ferrari et al. “Antenna Coupled MKID Performance Verification at 850 GHz for Large Format Astrophysics Arrays”. In: IEEE Transactions on Terahertz Science and Technology 8.1 (2018), pp. 127–139. doi: 10.1109/TTHZ. 2017.2764378. [5] ALMA. Accessed September 27, 2023. 2023. url: https://www.almaobservatory. org/en/about-alma/. [6] Masato Naruse et al. “Optical Efficiencies of Lens-Antenna Coupled Kinetic Inductance Detectors at 220 GHz”. In: IEEE Transactions on Terahertz Science and Technology 3.2 (2013), pp. 180–186. doi: 10.1109/TTHZ.2012.2237029. [7] Mohsen Hosseini, Wei-Ting Wong, and Joseph C. Bardin. “A 0.4–1.2 GHz SiGe Cryogenic LNA for Readout of MKID Arrays”. In: 2019 IEEE MTT-S International Microwave Symposium (IMS). 2019, pp. 164–167. doi: 10.1109/ MWSYM.2019.8701100. [8] Matthias Arndt et al. “A Data Acquisition System for Kinetic-Inductance De- tectors”. In: IEEE Transactions on Applied Superconductivity 26.3 (2016), pp. 1– 4. doi: 10.1109/TASC.2016.2540241. [9] K. J. Ramos, L. H. Arnaldi, and L. Tosi. “Towards a Low-Cost Readout System for Arrays of Cryogenic Detectors”. In: 2023 Argentine Conference on Electron- ics (CAE). 2023, pp. 19–23. doi: 10.1109/CAE56623.2023.10086976. 22 https://doi.org/10.1109/TMTT.2016.2544303 https://doi.org/10.1109/TMTT.2016.2544303 https://doi.org/10.1109/TMTT.2012.2187538 https://doi.org/10.1109/JPROC.2004.833670 https://doi.org/10.1109/TTHZ.2017.2764378 https://doi.org/10.1109/TTHZ.2017.2764378 https://www.almaobservatory.org/en/about-alma/ https://www.almaobservatory.org/en/about-alma/ https://doi.org/10.1109/TTHZ.2012.2237029 https://doi.org/10.1109/MWSYM.2019.8701100 https://doi.org/10.1109/MWSYM.2019.8701100 https://doi.org/10.1109/TASC.2016.2540241 https://doi.org/10.1109/CAE56623.2023.10086976 Bibliography 23 [10] Lorenzo Minutolo et al. “A Flexible GPU-Accelerated Radio-frequency Read- out for Superconducting Detectors”. In: IEEE Transactions on Applied Super- conductivity 29.5 (2019), pp. 1–5. doi: 10.1109/TASC.2019.2912027. [11] Ran Duan et al. “An open-source readout for MKIDs”. In: Proceedings of SPIE - The International Society for Optical Engineering 7741 (July 2010). doi: 10. 1117/12.856832. [12] Beatriz Aja et al. “Analysis and Performance of Lumped-Element Kinetic Inductance Detectors for W-Band”. In: IEEE Transactions on Microwave Theory and Techniques 69.1 (2021), pp. 578–589. doi: 10.1109/TMTT.2020.3038777. [13] H. McCarrick et al. “Horn-coupled, commercially-fabricated aluminum lumped- element kinetic inductance detectors for millimeter wavelengths”. In: Review of Scientific Instruments 85.123117 (2014). doi: 10.1063/1.4903855. [14] Jinxuan Chen et al. “A Novel GPU Acceleration Algorithm Based on CUDA and MPI for Ray Tracing Wireless Channel Modeling”. In: 2023 IEEE Wireless Communications and Networking Conference (WCNC). 2023, pp. 1–6. doi: 10. 1109/WCNC55385.2023.10118847. [15] Sean McHugh et al. “A readout for large arrays of microwave kinetic in- ductance detectors”. In: The Review of scientific instruments 83 (Apr. 2012), p. 044702. doi: 10.1063/1.3700812. [16] K. J. Ramos, L. H. Arnaldi, and L. Tosi. “Towards a Low-Cost Readout System for Arrays of Cryogenic Detectors”. In: 2023 Argentine Conference on Electron- ics (CAE). 2023, pp. 19–23. doi: 10.1109/CAE56623.2023.10086976. [17] Samuel W. Belling et al. “A Frequency Domain Multiplexing Technique for Multi-Channel Detector Instrumentation”. In: 2018 IEEE Nuclear Science Sym- posium and Medical Imaging Conference Proceedings (NSS/MIC). 2018, pp. 1–6. doi: 10.1109/NSSMIC.2018.8824306. [18] Eriko Nurvitadhi et al. “Accelerating Binarized Neural Networks: Compar- ison of FPGA, CPU, GPU, and ASIC”. In: 2016 International Conference on Field-Programmable Technology (FPT). 2016, pp. 77–84. doi: 10 . 1109 / FPT . 2016.7929192. [19] Svetlin A. Manavski. “CUDA Compatible GPU as an Efficient Hardware Ac- celerator for AES Cryptography”. In: 2007 IEEE International Conference on Signal Processing and Communications. 2007, pp. 65–68. doi: 10.1109/ICSPC. 2007.4728256. [20] John Nickolls and William J. Dally. “The GPU Computing Era”. In: IEEE Micro 30.2 (2010), pp. 56–69. doi: 10.1109/MM.2010.41. https://doi.org/10.1109/TASC.2019.2912027 https://doi.org/10.1117/12.856832 https://doi.org/10.1117/12.856832 https://doi.org/10.1109/TMTT.2020.3038777 https://doi.org/10.1063/1.4903855 https://doi.org/10.1109/WCNC55385.2023.10118847 https://doi.org/10.1109/WCNC55385.2023.10118847 https://doi.org/10.1063/1.3700812 https://doi.org/10.1109/CAE56623.2023.10086976 https://doi.org/10.1109/NSSMIC.2018.8824306 https://doi.org/10.1109/FPT.2016.7929192 https://doi.org/10.1109/FPT.2016.7929192 https://doi.org/10.1109/ICSPC.2007.4728256 https://doi.org/10.1109/ICSPC.2007.4728256 https://doi.org/10.1109/MM.2010.41 24 Bibliography [21] Lingze Zhang, Yongxing Du, and Daocheng Wu. “GPU-Accelerated FDTD simulation of human tissue using C++ AMP”. In: 2015 31st International Re- view of Progress in Applied Computational Electromagnetics (ACES). 2015, pp. 1– 2. [22] Shuai Che et al. “Accelerating Compute-Intensive Applications with GPUs and FPGAs”. In: 2008 Symposium on Application Specific Processors. 2008, pp. 101– 107. doi: 10.1109/SASP.2008.4570793. [23] Shuo Chen and Xiaoming Li. “A hybrid GPU/CPU FFT library for large FFT problems”. In: 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC). 2013, pp. 1–10. doi: 10.1109/PCCC.2013. 6742796. [24] Zhicheng Zhao and Yaqun Zhao. “The Optimization of FFT Algorithm Based with Parallel Computing on GPU”. In: 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). 2018, pp. 2003– 2007. doi: 10.1109/IAEAC.2018.8577843. [25] Chris Harris, Karen Haines, and Lister Staveley-Smith. “GPU accelerated radio astronomy signal convolution. Experimental Astronomy, 22(12), 129- 141”. In: Experimental Astronomy 22 (Oct. 2008), pp. 129–141. doi: 10.1007/ s10686-008-9114-9. [26] Ettus Research. USRP X310 High Performance Software Defined Radio. Product Brochure. Ettus Research. url: https://www.ettus.com/all- products/ x310-kit/ (visited on 04/20/2024). [27] Ettus Research. UBX 10 MHz - 6 GHz Rx/Tx, 160 MHz BW. url: https:// www.ettus.com/all-products/ubx160/ (visited on 04/20/2024). https://doi.org/10.1109/SASP.2008.4570793 https://doi.org/10.1109/PCCC.2013.6742796 https://doi.org/10.1109/PCCC.2013.6742796 https://doi.org/10.1109/IAEAC.2018.8577843 https://doi.org/10.1007/s10686-008-9114-9 https://doi.org/10.1007/s10686-008-9114-9 https://www.ettus.com/all-products/x310-kit/ https://www.ettus.com/all-products/x310-kit/ https://www.ettus.com/all-products/ubx160/ https://www.ettus.com/all-products/ubx160/ Front page English title page Contents Preface 1 Introduction 1.1 Working Principle and Applications of MKIDs 1.2 Limitations of FPGA-Based Readout Systems 2 Background 2.1 Field-Division Multiplexing (FDM) in MKID Readout 2.2 Advantages of GPU Acceleration for MKID Readout 2.3 Example from the Field 3 Methodology 3.1 Readout algorithm in the exemplary study 3.2 Accelerating MKID Readout with GPU 3.3 Replicate measurements from exemplary study 3.3.1 Experimental setup and system architecture 3.3.2 Performing measurements 4 Results 4.1 Readout of GHz frequency signals 4.2 Detection and measurements on 80 MHz signal 5 Conclusion Bibliography