Skip to main content
Log in

Policy learning for autonomous feature tracking

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We consider the problem of tracing the structure of oceanological features using autonomous underwater vehicles (AUVs). Solving this problem requires the construction of a control strategy that will determine the actions for the AUV based on the current state, as measured by on-board sensors and the historic trajectory (including sensed data) of the AUV. We approach this task by applying plan-based policy-learning, in which a large set of sampled problems are solved using planning and then, from the resulting plans a decision-tree is learned, using an established machine-learning algorithm, which forms the resulting policy. We evaluate our approach in simulation and report on sea trials of a prototype of a learned policy. We indicate some of the lessons learned from this deployed system and further evaluate an extended policy in simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. The vehicle can only communicate and acquire GPS coordinates when on the surface, but the blooms are usually concentrated below the surface. Furthermore, changing bouyancy has low energy cost and allows the vehicle to achieve lateral velocity by gliding through the water as it rises and falls. Finally, a yo-yo pattern allows the vehicle to track the structure of the bloom in three dimensions.

  2. More details on plan generation are provided in Sect. 4.2.

  3. Several figures presented throughout the rest of the paper are presented in colour and are difficult to interpret in monochrome. The reader is recommended to view the figures using an appropriate medium.

  4. We gratefully acknowledge Mike Godin for his work on this generator.

  5. Universal Transverse Mercator coordinates.

  6. \(2* (25/\tan (25))\).

  7. Variability in the water column is along the vertical dimension. Since the Dorado platform can only move forward, the yo-yo pattern is the most efficient mechanism for the scientific study of water column properties.

  8. Doppler Velocity Log which estimates speed over ground.

References

  • Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.

    Article  Google Scholar 

  • Beetz, M., Jain, D., Mösenlechner, L., Tenorth, M., Kunze, L., Blodow, N., et al. (2012). Cognition-enabled autonomous robot control for the realization of home chore task intelligence. Proceedings of the IEEE, Special Issue on Quality of Life Technology, 100(8), 2454–2471.

    Google Scholar 

  • Burbridge, C., Saigol, Z. A., Schmidt, F., Borst, C., & Dearden, R. (2012). Learning operators for manipulation planning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 686–693), IROS.

  • Ceballos, A., Bensalem, S., Cesta, A., de Silva, L., Fratini, S., Ingrand, F., et al. (2011). A goal-oriented autonomous controller for space exploration. In Proceedings of the 11th symposium on advanced space technologies in robotics and automation, Noordwijk, the Netherlands.

  • Chang, H. S., Givan, R., & Chong, E. K. P. (2000). On-line scheduling via sampling. In AIPS (pp. 62–71).

  • Coles, A., Fox, M., Long, D., & Smith, A. (2008). A hybrid relaxed planning graphlp heuristic for numeric planning domains. In Proceedings of international conference on automated planning and scheduling (ICAPS) (pp. 52–59).

  • Coles, A. J., Coles, A., Fox, M., & Long, D. (2013). A hybrid lp-rpg heuristic for modelling numeric resource flows in planning. Journal of Artificial Intelligence Research (JAIR), 46, 343–412.

    MATH  MathSciNet  Google Scholar 

  • Das, J., Rajan, K., Frolov, S., Py, F., Ryan, J., Caron, D.A., et al. (2010). Towards marine bloom trajectory prediction for AUV mission planning. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4784–4790).

  • Fern, A., Yoon, S. W., & Givan, R. (2004). Learning domain-specific control knowledge from random walks. In Proceedings of the fourteenth international conference on automated planning and scheduling (ICAPS 2004) (pp. 191–199).

  • Fern, A., Yoon, S. W., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 25, 75–118.

    MATH  MathSciNet  Google Scholar 

  • Fox, M., Long, D., & Magazzeni, D. (2011). Automatic construction of efficient multiple battery usage policies. In Procedings of the international conference on automated planning and scheduling (ICAPS) (pp. 74–81).

  • Fox, M., Long, D., & Magazzeni, D. (2012a). Plan-based policy-learning for autonomous feature tracking. In Proceedings of international conference on automated planning and scheduling (ICAPS) (pp. 38–46).

  • Fox, M., Long, D., & Magazzeni, D. (2012b). Plan-based policies for efficient multiple battery load management. Journal of AI Research (JAIR), 44, 335–382.

    MATH  Google Scholar 

  • Frank, J., & Jónsson, A. K. (2003). Constraint-based attribute and interval planning. Constraints, 8(4), 339–364.

    Article  MATH  MathSciNet  Google Scholar 

  • Gerevini, A., Saetti, A., & Serina, I. (2004). Planning with numerical expressions in LPG. In Proceedings of the 16th Eureopean conference on artificial intelligence (ECAI) (pp. 667–671).

  • Glenn, S., Kohut, J., McDonnell, J., Seidel, D., Aragon, D., Haskins, T., Handel, E., Haldeman, C., Heifetz, I., Kerfoot, J., Lemus, E., Lictenwalder, S., Ojanen, L., Roarty, H., Atlantic Crossing Students Jones, C., Webb, D., & Schofield, O. (2010). The trans-Atlantic Slocum glider expeditions: A catalyst for undergraduate participation in ocean science and technology. Marine Technology Society, 45(1), 52–67.

  • Glibert, P. M., Anderson, D. M., Gentien, P., Granéli, E., & Sellner, K. G. (2005). The global complex phenomena of harmful algal blooms. Oceanography, 18(2), 130–141.

    Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  • Haigh, K. Z., & Veloso, M. M. (1997). High-level planning and low-level execution: Towards a complete robotic agent. In Agents (pp. 363–370).

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 10–18.

    Google Scholar 

  • Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research (JAIR), 26, 191–246.

    Article  MATH  Google Scholar 

  • Hoagland, P., & Scatasta, S. (2006). The economic effects of harmful algal blooms. In E. Graneli & J. Turner (Eds.), Ecology of harmful algae studies series, Chap. 30. Berlin: Springer.

    Google Scholar 

  • Hoffmann, J. (2003). The metric-FF planning system: Translating “Ignoring Delete Lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20, 291–341.

    MATH  Google Scholar 

  • Jónsson, A. K., Morris, P., Muscettola, N., Rajan, K., & Smith, B. (2000). Planning in interplanetary space: Theory and practice. In Proceedings of the artificial intelligence planning and scheduling (AIPS).

  • Joshi, A., Ashley, T., Huang, Y. R., & Bertozzi, A. L. (2009). Experimental validation of cooperative environmental boundary tracking with on-board sensors. In American control conference, 2009 (ACC’09) (pp. 2630–2635), IEEE.

  • Li, H. X., & Williams, B. C. (2008). Generative planning for hybrid systems based on flow tubes. In ICAPS (pp. 206–213).

  • Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

    Article  Google Scholar 

  • Martin, F. (1996). Kids learning engineering science using LEGO and the programmable brick. In Proceedings of the annual meeting of the American Educational Research Association.

  • McDermott, D., et al. (1988). The PDPL planning domain definition language. In The AIPS 1998 Planning Competition Commitee. www.cs.yale.edu/homes/avm.

  • McGann, C., Berger, E., Boren, J., Chitta, S., Gerkey, B., Glaser, S., et al. (2009). Model-based, hierarchical control of a mobile manipulation platform. In 4th workshop on planning and plan execution for real world systems, ICAPS.

  • McGann, C., Py, F., Rajan, K., & Olaya, A. (2009) Integrated planning and execution for robotic exploration. In International workshop on hybrid control of autonomous systems (IJCAI’09), Pasadena, CA.

  • McGann, C., Py, F., Rajan, K., Ryan, J. P., & Henthorn, R. (2008). Adaptive control for autonomous underwater vehicles. In Proceedings of the 23rd AAAI conference on artificial intelligence (AAAI), (pp. 1319–1324).

  • McGann, C., Py, F., Rajan, K., Ryan, J. P., & Henthorn, R. (2008b). Adaptive control for autonomous underwater vehicles. In AAAI, Chicago, IL.

  • McGann, C., Py, F., Rajan, K., Thomas, H., Henthorn, R., & McEwen, R. (2008a). A deliberative architecture for AUV control. In IEEE international conference on robotics and automation (ICRA), Pasadena.

  • Meeussen, W., Wise, M., Glaser, S., Chitta, S., McGann, C., Mihelich, P., et al. (2010). Autonomous door opening and plugging in with a personal robot. In IEEE international conference on robotics and automation (ICRA) (pp. 729–736). Anchorage, AK: IEEE.

  • Mehta, N., Tadepalli, P., & Fern, A. (2011). Autonomous learning of action models for planning. In 25th Annual conference on neural information processing systems (NIPS) (pp. 2465–2473).

  • Muscettola, N. (1994). HSTS: Integrating planning and scheduling. In M. Fox & M. Zweben (Eds.), Intelligent scheduling. Los Altos: Morgan Kaufmann.

  • Oceanography, T. C. (2003). Robots in the deep. Nature, 421(6922), 468–470.

    Article  Google Scholar 

  • Patrón, P., Miguelanez, E., Petillot, Y. R., & Lane, D. M. (2008). Fault tolerant adaptive mission planning with semantic knowledge representation for autonomous underwater vehicles. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2593–2598).

  • Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–97.

    Article  Google Scholar 

  • Pinto, J., Sousa, J., Py, F., & Rajan, K. (2012). Experiments with deliberative planning on autonomous underwater vehicles. In Workshop on robotics for environmental monitoring (IROS), Algarve, Portugal.

  • Pollack, M. E. (2002). Planning technology for intelligent cognitive orthotics. In Proceedings of the 6th international conference on artificial intelligence planning systems (AIPS) (pp. 322–332).

  • Py, F., Rajan, K., & McGann, C. (2010). A systematic agent framework for situated autonomous systems. In International conference on autonomous agents and multiagent systems (AAMAS), Toronto, Canada.

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. Los Altos: Morgan Kaufmann.

  • Rajan, K., & Py, F. (2012). T-REX: Partitioned inference for AUV mission control. In G. N. Roberts, R. Sutton (Eds.), Further advances in unmanned marine vehicles, IEE (to be published).

  • Ryan, J. P., Dierssen, H. M., Kudela, R. M., Scholin, C. A., Johnson, K. S., Sullivan, J. M., et al. (2005). Coastal ocean physics and red tides: An example from Monterey Bay, California. Oceanography, 18, 246–255.

    Google Scholar 

  • Saigol, Z. A., Dearden, R., Wyatt, J. L., & Murton, B. J. (2009). Information-lookahead planning for auv mapping. In Proceedings of the 21st international joint conference on artificial intelligence (IJCAI) (pp. 1831–1836).

  • Sanner, S., & Boutilier, C. (2009). Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5–6), 748–788.

    Article  MATH  MathSciNet  Google Scholar 

  • Simmons, R. G., Goodwin, R., Haigh, K. Z., Koenig, S., O’Sullivan, J., & Veloso, M. M. (1997). Xavier: Experience with a layered robot architecture. SIGART Bulletin, 8(1–4), 22–33.

    Article  Google Scholar 

  • Singh, S., Simmons, R. G., Smith, T., Stentz, A., Verma, V., Yahja, A., et al. (2000). Recent progress in local and global traversability for planetary rovers. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1194–1200).

  • Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Adaptive computation and machine learning. MIT Press . http://books.google.co.uk/books?id=CAFR6IBF4xYC.

  • Teichteil-Königsbuch, F., Kuter, U., & Infantes, G. (2010). Incremental plan aggregation for generating policies in MDPs. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS).

  • Thompson, D. R., Chien, S. A., Chao, Y., Li, P., Cahill, B., Levin, J., et al. (2010). Spatiotemporal path planning in strong, dynamic, uncertain currents. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4778–4783).

  • Tran, D., Chien, S. A., Sherwood, R., Castaño, R., Cichy, B., Davies, A., et al. (2004). The autonomous sciencecraft experiment onboard the EO-1 spacecraft. In Proceedings of the 19th national conference on artificial intelligence, 16th conference on innovative applications of artificial intelligence (AAAI/IAAI) (pp. 1040–1041).

  • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. Thesis, Cambridge University, England.

  • Yoon, S. W., Fern, A., & Givan, R. (2002). Inductive policy selection for first-order MDPs. In Proceedings of the conference on uncertainty in AI (UAI) (pp. 568–576).

  • Yoon, S. W., Fern, A., & Givan, R. (2008). Learning control knowledge for forward search planning. Journal of Machine Learning Research, 9, 683–718.

    MATH  MathSciNet  Google Scholar 

  • Zhang, Y., McEwen, R. S., Ryan, J. P., Bellingham, J. G., Thomas, H., Thompson, C. H., et al. (2011). A peak-capture algorithm used on an autonomous underwater vehicle in the 2010 Gulf of Mexico oil spill response scientific survey. Journal of Field Robotics, 28(4), 484–496.

    Article  Google Scholar 

Download references

Acknowledgments

K.C.L. authors are partially funded by the EU FP7 Project 288273 PANDORA and the EPSRC Project “Automated Modelling and Reformulation in Planning” (EP/G0233650). MBARI authors are funded by a block grant from the David and Lucile Packard Foundation to MBARI and in part by NSF Grant No. 1124975 and NOAA Grant No. NA11NOS4780055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Magazzeni.

Appendix A: Patch generation code

Appendix A: Patch generation code

figure f

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magazzeni, D., Py, F., Fox, M. et al. Policy learning for autonomous feature tracking. Auton Robot 37, 47–69 (2014). https://doi.org/10.1007/s10514-013-9375-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-013-9375-7

Keywords

Navigation