Policy learning for autonomous feature tracking

Magazzeni, Daniele; Py, Frédéric; Fox, Maria; Long, Derek; Rajan, Kanna

doi:10.1007/s10514-013-9375-7

Policy learning for autonomous feature tracking

Published: 01 December 2013

Volume 37, pages 47–69, (2014)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Daniele Magazzeni¹,
Frédéric Py²,
Maria Fox¹,
Derek Long¹ &
…
Kanna Rajan²

807 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the problem of tracing the structure of oceanological features using autonomous underwater vehicles (AUVs). Solving this problem requires the construction of a control strategy that will determine the actions for the AUV based on the current state, as measured by on-board sensors and the historic trajectory (including sensed data) of the AUV. We approach this task by applying plan-based policy-learning, in which a large set of sampled problems are solved using planning and then, from the resulting plans a decision-tree is learned, using an established machine-learning algorithm, which forms the resulting policy. We evaluate our approach in simulation and report on sea trials of a prototype of a learned policy. We indicate some of the lessons learned from this deployed system and further evaluate an extended policy in simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

Faiyaz Ahmed, J. C. Mohanta, … Pankaj Singh Yadav

Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review

Article 26 September 2022

Jun Tang, Haibin Duan & Songyang Lao

UAV Path Planning Using Optimization Approaches: A Survey

Article 18 April 2022

Amylia Ait Saadi, Assia Soukane, … Amar Ramdane-Cherif

Notes

The vehicle can only communicate and acquire GPS coordinates when on the surface, but the blooms are usually concentrated below the surface. Furthermore, changing bouyancy has low energy cost and allows the vehicle to achieve lateral velocity by gliding through the water as it rises and falls. Finally, a yo-yo pattern allows the vehicle to track the structure of the bloom in three dimensions.
More details on plan generation are provided in Sect. 4.2.
Several figures presented throughout the rest of the paper are presented in colour and are difficult to interpret in monochrome. The reader is recommended to view the figures using an appropriate medium.
We gratefully acknowledge Mike Godin for his work on this generator.
Universal Transverse Mercator coordinates.
\(2* (25/\tan (25))\).
Variability in the water column is along the vertical dimension. Since the Dorado platform can only move forward, the yo-yo pattern is the most efficient mechanism for the scientific study of water column properties.
Doppler Velocity Log which estimates speed over ground.

References

Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Article Google Scholar
Beetz, M., Jain, D., Mösenlechner, L., Tenorth, M., Kunze, L., Blodow, N., et al. (2012). Cognition-enabled autonomous robot control for the realization of home chore task intelligence. Proceedings of the IEEE, Special Issue on Quality of Life Technology, 100(8), 2454–2471.
Google Scholar
Burbridge, C., Saigol, Z. A., Schmidt, F., Borst, C., & Dearden, R. (2012). Learning operators for manipulation planning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 686–693), IROS.
Ceballos, A., Bensalem, S., Cesta, A., de Silva, L., Fratini, S., Ingrand, F., et al. (2011). A goal-oriented autonomous controller for space exploration. In Proceedings of the 11th symposium on advanced space technologies in robotics and automation, Noordwijk, the Netherlands.
Chang, H. S., Givan, R., & Chong, E. K. P. (2000). On-line scheduling via sampling. In AIPS (pp. 62–71).
Coles, A., Fox, M., Long, D., & Smith, A. (2008). A hybrid relaxed planning graphlp heuristic for numeric planning domains. In Proceedings of international conference on automated planning and scheduling (ICAPS) (pp. 52–59).
Coles, A. J., Coles, A., Fox, M., & Long, D. (2013). A hybrid lp-rpg heuristic for modelling numeric resource flows in planning. Journal of Artificial Intelligence Research (JAIR), 46, 343–412.
MATH MathSciNet Google Scholar
Das, J., Rajan, K., Frolov, S., Py, F., Ryan, J., Caron, D.A., et al. (2010). Towards marine bloom trajectory prediction for AUV mission planning. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4784–4790).
Fern, A., Yoon, S. W., & Givan, R. (2004). Learning domain-specific control knowledge from random walks. In Proceedings of the fourteenth international conference on automated planning and scheduling (ICAPS 2004) (pp. 191–199).
Fern, A., Yoon, S. W., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 25, 75–118.
MATH MathSciNet Google Scholar
Fox, M., Long, D., & Magazzeni, D. (2011). Automatic construction of efficient multiple battery usage policies. In Procedings of the international conference on automated planning and scheduling (ICAPS) (pp. 74–81).
Fox, M., Long, D., & Magazzeni, D. (2012a). Plan-based policy-learning for autonomous feature tracking. In Proceedings of international conference on automated planning and scheduling (ICAPS) (pp. 38–46).
Fox, M., Long, D., & Magazzeni, D. (2012b). Plan-based policies for efficient multiple battery load management. Journal of AI Research (JAIR), 44, 335–382.
MATH Google Scholar
Frank, J., & Jónsson, A. K. (2003). Constraint-based attribute and interval planning. Constraints, 8(4), 339–364.
Article MATH MathSciNet Google Scholar
Gerevini, A., Saetti, A., & Serina, I. (2004). Planning with numerical expressions in LPG. In Proceedings of the 16th Eureopean conference on artificial intelligence (ECAI) (pp. 667–671).
Glenn, S., Kohut, J., McDonnell, J., Seidel, D., Aragon, D., Haskins, T., Handel, E., Haldeman, C., Heifetz, I., Kerfoot, J., Lemus, E., Lictenwalder, S., Ojanen, L., Roarty, H., Atlantic Crossing Students Jones, C., Webb, D., & Schofield, O. (2010). The trans-Atlantic Slocum glider expeditions: A catalyst for undergraduate participation in ocean science and technology. Marine Technology Society, 45(1), 52–67.
Glibert, P. M., Anderson, D. M., Gentien, P., Granéli, E., & Sellner, K. G. (2005). The global complex phenomena of harmful algal blooms. Oceanography, 18(2), 130–141.
Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Haigh, K. Z., & Veloso, M. M. (1997). High-level planning and low-level execution: Towards a complete robotic agent. In Agents (pp. 363–370).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 10–18.
Google Scholar
Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research (JAIR), 26, 191–246.
Article MATH Google Scholar
Hoagland, P., & Scatasta, S. (2006). The economic effects of harmful algal blooms. In E. Graneli & J. Turner (Eds.), Ecology of harmful algae studies series, Chap. 30. Berlin: Springer.
Google Scholar
Hoffmann, J. (2003). The metric-FF planning system: Translating “Ignoring Delete Lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20, 291–341.
MATH Google Scholar
Jónsson, A. K., Morris, P., Muscettola, N., Rajan, K., & Smith, B. (2000). Planning in interplanetary space: Theory and practice. In Proceedings of the artificial intelligence planning and scheduling (AIPS).
Joshi, A., Ashley, T., Huang, Y. R., & Bertozzi, A. L. (2009). Experimental validation of cooperative environmental boundary tracking with on-board sensors. In American control conference, 2009 (ACC’09) (pp. 2630–2635), IEEE.
Li, H. X., & Williams, B. C. (2008). Generative planning for hybrid systems based on flow tubes. In ICAPS (pp. 206–213).
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Article Google Scholar
Martin, F. (1996). Kids learning engineering science using LEGO and the programmable brick. In Proceedings of the annual meeting of the American Educational Research Association.
McDermott, D., et al. (1988). The PDPL planning domain definition language. In The AIPS 1998 Planning Competition Commitee. www.cs.yale.edu/homes/avm.
McGann, C., Berger, E., Boren, J., Chitta, S., Gerkey, B., Glaser, S., et al. (2009). Model-based, hierarchical control of a mobile manipulation platform. In 4th workshop on planning and plan execution for real world systems, ICAPS.
McGann, C., Py, F., Rajan, K., & Olaya, A. (2009) Integrated planning and execution for robotic exploration. In International workshop on hybrid control of autonomous systems (IJCAI’09), Pasadena, CA.
McGann, C., Py, F., Rajan, K., Ryan, J. P., & Henthorn, R. (2008). Adaptive control for autonomous underwater vehicles. In Proceedings of the 23rd AAAI conference on artificial intelligence (AAAI), (pp. 1319–1324).
McGann, C., Py, F., Rajan, K., Ryan, J. P., & Henthorn, R. (2008b). Adaptive control for autonomous underwater vehicles. In AAAI, Chicago, IL.
McGann, C., Py, F., Rajan, K., Thomas, H., Henthorn, R., & McEwen, R. (2008a). A deliberative architecture for AUV control. In IEEE international conference on robotics and automation (ICRA), Pasadena.
Meeussen, W., Wise, M., Glaser, S., Chitta, S., McGann, C., Mihelich, P., et al. (2010). Autonomous door opening and plugging in with a personal robot. In IEEE international conference on robotics and automation (ICRA) (pp. 729–736). Anchorage, AK: IEEE.
Mehta, N., Tadepalli, P., & Fern, A. (2011). Autonomous learning of action models for planning. In 25th Annual conference on neural information processing systems (NIPS) (pp. 2465–2473).
Muscettola, N. (1994). HSTS: Integrating planning and scheduling. In M. Fox & M. Zweben (Eds.), Intelligent scheduling. Los Altos: Morgan Kaufmann.
Oceanography, T. C. (2003). Robots in the deep. Nature, 421(6922), 468–470.
Article Google Scholar
Patrón, P., Miguelanez, E., Petillot, Y. R., & Lane, D. M. (2008). Fault tolerant adaptive mission planning with semantic knowledge representation for autonomous underwater vehicles. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2593–2598).
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–97.
Article Google Scholar
Pinto, J., Sousa, J., Py, F., & Rajan, K. (2012). Experiments with deliberative planning on autonomous underwater vehicles. In Workshop on robotics for environmental monitoring (IROS), Algarve, Portugal.
Pollack, M. E. (2002). Planning technology for intelligent cognitive orthotics. In Proceedings of the 6th international conference on artificial intelligence planning systems (AIPS) (pp. 322–332).
Py, F., Rajan, K., & McGann, C. (2010). A systematic agent framework for situated autonomous systems. In International conference on autonomous agents and multiagent systems (AAMAS), Toronto, Canada.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Los Altos: Morgan Kaufmann.
Rajan, K., & Py, F. (2012). T-REX: Partitioned inference for AUV mission control. In G. N. Roberts, R. Sutton (Eds.), Further advances in unmanned marine vehicles, IEE (to be published).
Ryan, J. P., Dierssen, H. M., Kudela, R. M., Scholin, C. A., Johnson, K. S., Sullivan, J. M., et al. (2005). Coastal ocean physics and red tides: An example from Monterey Bay, California. Oceanography, 18, 246–255.
Google Scholar
Saigol, Z. A., Dearden, R., Wyatt, J. L., & Murton, B. J. (2009). Information-lookahead planning for auv mapping. In Proceedings of the 21st international joint conference on artificial intelligence (IJCAI) (pp. 1831–1836).
Sanner, S., & Boutilier, C. (2009). Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5–6), 748–788.
Article MATH MathSciNet Google Scholar
Simmons, R. G., Goodwin, R., Haigh, K. Z., Koenig, S., O’Sullivan, J., & Veloso, M. M. (1997). Xavier: Experience with a layered robot architecture. SIGART Bulletin, 8(1–4), 22–33.
Article Google Scholar
Singh, S., Simmons, R. G., Smith, T., Stentz, A., Verma, V., Yahja, A., et al. (2000). Recent progress in local and global traversability for planetary rovers. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1194–1200).
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Adaptive computation and machine learning. MIT Press . http://books.google.co.uk/books?id=CAFR6IBF4xYC.
Teichteil-Königsbuch, F., Kuter, U., & Infantes, G. (2010). Incremental plan aggregation for generating policies in MDPs. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS).
Thompson, D. R., Chien, S. A., Chao, Y., Li, P., Cahill, B., Levin, J., et al. (2010). Spatiotemporal path planning in strong, dynamic, uncertain currents. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4778–4783).
Tran, D., Chien, S. A., Sherwood, R., Castaño, R., Cichy, B., Davies, A., et al. (2004). The autonomous sciencecraft experiment onboard the EO-1 spacecraft. In Proceedings of the 19th national conference on artificial intelligence, 16th conference on innovative applications of artificial intelligence (AAAI/IAAI) (pp. 1040–1041).
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. Thesis, Cambridge University, England.
Yoon, S. W., Fern, A., & Givan, R. (2002). Inductive policy selection for first-order MDPs. In Proceedings of the conference on uncertainty in AI (UAI) (pp. 568–576).
Yoon, S. W., Fern, A., & Givan, R. (2008). Learning control knowledge for forward search planning. Journal of Machine Learning Research, 9, 683–718.
MATH MathSciNet Google Scholar
Zhang, Y., McEwen, R. S., Ryan, J. P., Bellingham, J. G., Thomas, H., Thompson, C. H., et al. (2011). A peak-capture algorithm used on an autonomous underwater vehicle in the 2010 Gulf of Mexico oil spill response scientific survey. Journal of Field Robotics, 28(4), 484–496.
Article Google Scholar

Download references

Acknowledgments

K.C.L. authors are partially funded by the EU FP7 Project 288273 PANDORA and the EPSRC Project “Automated Modelling and Reformulation in Planning” (EP/G0233650). MBARI authors are funded by a block grant from the David and Lucile Packard Foundation to MBARI and in part by NSF Grant No. 1124975 and NOAA Grant No. NA11NOS4780055.

Author information

Authors and Affiliations

Department of Informatics, King’s College London, London, UK
Daniele Magazzeni, Maria Fox & Derek Long
Monterey Bay Aquarium Research Institute, Monterey, CA, USA
Frédéric Py & Kanna Rajan

Authors

Daniele Magazzeni
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Py
View author publications
You can also search for this author in PubMed Google Scholar
Maria Fox
View author publications
You can also search for this author in PubMed Google Scholar
Derek Long
View author publications
You can also search for this author in PubMed Google Scholar
Kanna Rajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele Magazzeni.

Appendix A: Patch generation code

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magazzeni, D., Py, F., Fox, M. et al. Policy learning for autonomous feature tracking. Auton Robot 37, 47–69 (2014). https://doi.org/10.1007/s10514-013-9375-7

Download citation

Received: 14 December 2012
Accepted: 14 November 2013
Published: 01 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10514-013-9375-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy learning for autonomous feature tracking

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review

UAV Path Planning Using Optimization Approaches: A Survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Patch generation code

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Policy learning for autonomous feature tracking

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review

UAV Path Planning Using Optimization Approaches: A Survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Patch generation code

Appendix A: Patch generation code

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation