Accurate and robust tracking of objects is of growing interest amongst the computer vision scientific community. The ability of a multi-sensor system to detect and track objects, and accurately predict their future trajectory is critical in the context of mission- and safety-critical applications. Remotely Piloted Aircraft System (RPAS) are currently not equipped to routinely access all classes of airspace since certified Detect-and-Avoid (DAA) systems are yet to be developed. Such capabilities can be achieved by incorporating both cooperative and non-cooperative DAA functions, as well as providing enhanced communications, navigation and surveillance (CNS) services. DAA is highly dependent on the performance of CNS systems for Detection, Tacking and avoiding (DTA) tasks and maneuvers. In order to perform an effective detection of objects, a number of high performance, reliable and accurate avionics sensors and systems are adopted including non-cooperative sensors (visual and thermal cameras, Laser radar (LIDAR) and acoustic sensors) and cooperative systems (Automatic Dependent Surveillance-Broadcast (ADS-B) and Traffic Collision Avoidance System (TCAS)). In this paper the sensors and system information candidates are fully exploited in a Multi-Sensor Data Fusion (MSDF) architecture. An Unscented Kalman Filter (UKF) and a more advanced Particle Filter (PF) are adopted to estimate the state vector of the objects based for maneuvering and non-maneuvering DTA tasks. Furthermore, an artificial neural network is conceptualised/adopted to exploit the use of statistical learning methods, which acts to combined information obtained from the UKF and PF. After describing the MSDF architecture, the key mathematical models for data fusion are presented. Conceptual studies are carried out on visual and thermal image fusion architectures.