Graceful Stop For Mojolicious Apps: Preventing Timer Loss

by ADMIN 58 views

Have you ever faced the frustrating issue of your Mojolicious application not stopping gracefully, leading to lost timers and unexpected errors? You're not alone! Many developers encounter this problem when trying to implement a clean shutdown for their applications. In this article, we'll dive deep into the issue, explore the root causes, and provide you with practical solutions to ensure your Mojolicious apps stop gracefully every time.

Understanding the Problem

So, you are trying to gracefully stop a Mojolicious application, particularly from within a callback function for a non-blocking request. You might be using code similar to this:

... sub {
    Mojo::IOLoop->stop_gracefully;
    $server->stop;
    undef $server; # Must destroy server otherwise it does not stop.
}

Sometimes, it works perfectly, and other times, it throws a wrench in the gears with this error:

Mojo::Reactor::Poll: I/O watcher failed: Can't call method "_finish" on an undefined value at ../local-lib/lib/perl5/Mojo/Server/Daemon.pm line 247.

Delving deeper, you might see this in Mojo::Reactor::Poll:

Mojo::Reactor::Poll
sub _write {
  my ($self, $id) = @_;
  ...
  $self->ioloop->stream($id)->write($chunk => sub { $self->$next($id) });
}

In these failure scenarios, the finish event for the transaction mysteriously goes missing. Adding to the confusion, any scheduled timers seem to vanish into thin air.

The Timer Example

To illustrate the issue, consider this example with a timer:

use Mojo::Server::Daemon;
use Mojolicious::Lite;
use Mojo::UserAgent;
use Mojo::IOLoop;

hook after_build_tx => sub {
    my ($tx, $app) = @_;
    print "New transaction >>$tx<<\n";
    $tx->on(finish => sub {
        my ($tx) = @_;
        print "Finished transaction >>$tx<<\n";
    });
};

get '/test' => sub {
    Mojo::IOLoop->timer(2 => sub {
        print "Timer\n";
    });

    shift->render(text => 'Ok')
};

my $server = Mojo::Server::Daemon->new(app => app(), listen => ['http://*:7543']);
$server->start;

my $ua = Mojo::UserAgent->new(server => undef);
print "Issue request:\n";
$ua->get('http://127.0.0.1:7543/test', sub {
    Mojo::IOLoop->stop_gracefully;
    $server->stop;
    undef $server;
});

Mojo::IOLoop->start unless Mojo::IOLoop->is_running;

This code sets up a timer within a route, but when the application stops gracefully, the timer might not fire, leading to lost functionality.

Reproducing the Issue

To reproduce this behavior, you can use the following code snippet:

use Mojolicious::Lite;
use Mojo::Server::Daemon;
use Mojo::UserAgent;

hook after_build_tx => sub {
    my ($tx, $app) = @_;
    print "New transaction >>$tx<<\n";
    $tx->on(finish => sub {
        my ($tx) = @_;
        print "Finished transaction >>$tx<<\n";
    });
};

get '/test' => sub { shift->render(text => 'Ok') };

my $server = Mojo::Server::Daemon->new(app => app(), listen => ['http://*:7543']);
$server->start;

my $ua = Mojo::UserAgent->new(server => undef);
print "Issue request:\n";
$ua->get('http://127.0.0.1:7543/test', sub {
    Mojo::IOLoop->stop_gracefully;
    $server->stop;
    undef $server;
});

Mojo::IOLoop->start unless Mojo::IOLoop->is_running;

Save this as graceful.t and run it. You'll notice that sometimes it works fine, printing transaction details, and sometimes it crashes with the dreaded Can't call method "_finish" error:

$ perl -I ../local-lib/lib/perl5/ t/gracefull.t
[2025-10-13 17:32:47.05852] [128220] [info] Listening at "http://*:7543"
Web application available at http://127.0.0.1:7543
New transaction >>Mojo::Transaction::HTTP=HASH(0x614dd2368838)<<
[2025-10-13 17:32:47.06055] [128220] [trace] [0crZtoV1yxCB] GET "/test"
[2025-10-13 17:32:47.06080] [128220] [trace] [0crZtoV1yxCB] Routing to a callback
[2025-10-13 17:32:47.06100] [128220] [trace] [0crZtoV1yxCB] 200 OK (0.000441s, 2267.574/s)
Finished transaction >>Mojo::Transaction::HTTP=HASH(0x614dd2368838)<<

$ perl -I ../local-lib/lib/perl5/ t/gracefull.t
[2025-10-13 17:32:53.69074] [128257] [info] Listening at "http://*:7543"
Web application available at http://127.0.0.1:7543
New transaction >>Mojo::Transaction::HTTP=HASH(0x596c2acbc778)<<
[2025-10-13 17:32:53.69256] [128257] [trace] [HYS2zjcvO9XO] GET "/test"
[2025-10-13 17:32:53.69274] [128257] [trace] [HYS2zjcvO9XO] Routing to a callback
[2025-10-13 17:32:53.69286] [128257] [trace] [HYS2zjcvO9XO] 200 OK (0.00029s, 3448.276/s)
Mojo::Reactor::Poll: I/O watcher failed: Can't call method "_finish" on an undefined value at ../local-lib/lib/perl5/Mojo/Server/Daemon.pm line 247.

Diving into the Investigation

At a high level, the flow goes something like this: the User Agent (UA) sends a request, the server responds, a callback is triggered, the IOLoop is stopped, and the server is destroyed. Mojolicious juggles three file handles during this dance:

  • 5: The listening socket.
  • 11: Created for the UA.
  • 12: Created for the server.

Each tick, these file handles are checked in a seemingly random order, thanks to the unpredictable nature of Perl hashes. You can see this randomness in action here:

https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/Reactor/Poll.pm#L50-L56

The io hash, containing these file handles, looks something like this:

io => {
  11 => {
    cb => &Mojo::IOLoop::Stream::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Stream.pm:63],
    mode => 3,
  },
  12 => {
    cb => &Mojo::IOLoop::Stream::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Stream.pm:63],
    mode => 3,
  },
  5 => {
    cb => &Mojo::IOLoop::Server::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Server.pm:90],
    mode => 3,
  },
},

Most of the time, the order doesn't matter. However, after the server responds, the sequence of events becomes critical, leading to two possible scenarios.

Scenario 1: The Smooth Stop

  1. The server responds and queues the _finish event (line 247 in Mojo/Server/Daemon.pm). https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/Server/Daemon.pm#L244-L247
  2. File handle 12 is selected, triggering _finish: https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop/Stream.pm#L131 This gracefully closes the transaction.
  3. File handle 11 is processed, triggering read and resolving the request.
  4. The callback for $ua->get is called.
  5. The IOLoop is stopped gracefully, the server is stopped, undef $server is called, and the server is destroyed.

Scenario 2: The Crash

  1. The server responds and queues the _finish event.
  2. File handle 11 is processed before 12. This triggers read and resolves the request.
  3. The callback for $ua->get is called.
  4. The IOLoop is stopped gracefully, the server is stopped, and undef $server is called. The server is destroyed!
  5. File handle 12 is selected. The application tries to run _finish, but it fails because $self (the server) no longer exists. This leads to the Can't call method "_finish" exception, and the transaction's finish event is never fired. https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/Server/Daemon.pm#L247

Solutions and Workarounds

So, how do we fix this? Let's explore some options.

1. The Quick Fix (with Caveats)

A simple workaround is to add a check for $self before calling $self->$next($id) in Mojo::IOLoop::Stream:

-$self->ioloop->stream($id)->write($chunk => sub { $self->$next($id) });
+$self->ioloop->stream($id)->write($chunk => sub { $self and $self->$next($id) });

While this prevents the crash, it doesn't solve the underlying problem of the finish event not being called for the transaction. This could lead to unexpected behavior if your application relies on this event.

2. A Deeper Dive: Framework Behavior

The core issue is that the framework allows application destruction while streams still have pending events within the Mojo::IOLoop. Consider the call stack when undef $server happens:

  1. Line 26 in Mojo/Server/Daemon.pm: https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/Server/Daemon.pm#L22-L27

  2. Line 86 in Mojo/IOLoop.pm: https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop.pm#L83-L88

    Pay close attention to stream->events->drain. The stream might still have subscribers to the drain event, even if the buffer is empty. This happens because the server has written everything, and the client has read it, but the drain event is still scheduled.

$c
{
  stream => Mojo::IOLoop::Stream {
    buffer => ,
    events => {
      close => [
        &Mojo::IOLoop::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop.pm:234],
        &Mojo::Server::Daemon::__ANON__[lib-nodebug/lib/perl5/Mojo/Server/Daemon.pm:205],
      ],
      drain => [
        &Mojo::EventEmitter::__ANON__[lib-nodebug/lib/perl5/Mojo/EventEmitter.pm:36],
      ],
      error => [
        &Mojo::Server::Daemon::__ANON__[lib-nodebug/lib/perl5/Mojo/Server/Daemon.pm:206],
      ],
      read => [
        &Mojo::Server::Daemon::__ANON__[lib-nodebug/lib/perl5/Mojo/Server/Daemon.pm:207],
      ],
      timeout => [
        &Mojo::Server::Daemon::__ANON__[lib-nodebug/lib/perl5/Mojo/Server/Daemon.pm:208],
      ],
    },
    handle => IO::Socket::IP GLOB,
    reactor => Mojo::Reactor::Poll {
      events => {
        error => [
          &Mojo::IOLoop::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop.pm:29],
        ],
      },
      io => {
        11 => {
          cb => &Mojo::IOLoop::Stream::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Stream.pm:63],
          mode => 3,
        },
        12 => {
          cb => &Mojo::IOLoop::Stream::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Stream.pm:63],
          mode => 7,
        },
      },
      next_tick => [],
      running => 1,
      timers => {
        e90331dbbf9ff2c8d5155034be091f0d => {
          after => 2,
          cb => &Mojo::IOLoop::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop.pm:243],
          recurring => 0,
          time => 36935.478503053,
        },
      },
    },
    read => 109,
    timeout => 0,
    written => 144,
  },
}
  1. Line 30 in Mojo/IOLoop/Stream.pm: https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop/Stream.pm#L30

    The stream is still writing because of these drain subscribers, even though the buffer is empty. The flow originates from the Mojo::Server::Daemon::DESTROY sub, which means the application is in the process of destruction. Trying to gracefully close the stream here is problematic because this might be the last incoming connection. This could lead to the Mojo::IOLoop being stopped, and the drain event will never be called. https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop.pm#L211 https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop.pm#L197-L199

3. Key Questions and Potential Improvements

Several questions arise from this analysis:

  1. Why is the drain call scheduled for later instead of immediately when the transaction completes and the buffer is empty?
  2. Why are the file handles still sitting inside Mojo::Reactor::Poll->{io} after the application is destroyed? This happens because Mojo::IOLoop is in graceful stop mode.
  3. Is it a good idea to wait for all timers to finish during application destruction? While waiting might be desirable in graceful stop mode, it might not be appropriate during forced shutdown. The condition for waiting for timers could be improved: https://github.com/mojolicious/mojo/blob/834a1f3e58ec72726e9b81f45f8e95f34484085d/lib/Mojo/IOLoop.pm#L197-L199 If Mojo::IOLoop still has timers, it might be necessary to delay the stop until they complete. The reactor state with timers looks like this:
reactor => Mojo::Reactor::Poll {
  events => {
    error => [
      &Mojo::IOLoop::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop.pm:29],
    ],
  },
  io => {
    11 => {
      cb => &Mojo::IOLoop::Stream::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop/Stream.pm:63],
      mode => 3,
    },
  },
  next_tick => [],
  running => 1,
  timers => {
    a284b4beeddf853b9db6b47e5b33eb8d => {
      after => 2,
      cb => &Mojo::IOLoop::__ANON__[lib-nodebug/lib/perl5/Mojo/IOLoop.pm:243],
      recurring => 0,
      time => 40348.46695112,
    },
  },
},

Towards a Robust Solution

While Test::Mojo is a fantastic tool for testing Mojolicious applications, it might not fully address the complexities of a ::Prefork server in a production environment. For a robust solution, consider these approaches:

  1. Delay server destruction: Avoid calling undef $server immediately after stopping the IOLoop. Give the IOLoop a chance to process pending events and timers before destroying the server instance. This can be achieved by using Mojo::IOLoop->next_tick to defer the destruction.
  2. Improve drain event handling: Investigate why the drain event is not called immediately after the transaction completes. Optimizing this could prevent streams from lingering and causing issues during shutdown.
  3. Refine timer management during shutdown: Implement a more nuanced approach to handling timers during shutdown. Differentiate between graceful shutdowns where timers should be allowed to complete and forced shutdowns where timers can be discarded.
  4. Address the race condition: The core issue is a race condition between the I/O loop processing file handles and the server destruction. A potential solution could involve synchronizing these operations to ensure that file handle processing completes before server destruction.

Conclusion

Gracefully stopping a Mojolicious application can be tricky, but understanding the underlying mechanisms and potential pitfalls is crucial. By addressing the race condition, optimizing event handling, and refining timer management, you can ensure your Mojolicious apps shut down cleanly and reliably, preventing lost timers and unexpected errors. So, go forth and build robust, gracefully stopping applications!