Literally what makes queues appear and how will they behave when they go away
Note
This is applicable to lots of fields, industrial design, call centres, computer executing transactions
- Given a choice between a single machine with speed s, or n machines, each with speed , which should we choose.
- If the arrival rate and service rate double, how does the mean response time change.
- Should we try to balance load or is that a waste of time or effort
- Can we give priority to certain operations without harming another category of job
- How do job size variability and heavy-tailored workloads affect our choices of scheduling policy
Formal terms
- Server (S): The thing that fulfills requests
- Customer (V): The initiator of the service requests
- Wait time (W): The time a customer spends waiting in line
- Service time (s): The time from when a worker starts to serve a customer up to the time when the next customer is called forward
- Arrival rate (): The rate at which customers arrive
- Service rate (): The rate at which customers are serviced
- Utilization (U): The fraction of the workers time used actually handling requests (not idling)
- Queue length (N): The total number of customers waiting, or currently with a worker
- Response time (R): The sum of the wait and service time for a single visit
- Residence time (R’): The total response time if a customer visits several workers or the same one multiple times
- Throughout (X): The rate at which customers get their requests serviced and dealt with
- E: Usually means expected and is a function of something else in the system
Computer Context
The CPU uses a time sharing scheduler to run as many concurrent programs as possible. A router has a queue for packets that has a maximum size, and if this is exceeded packets are dropped. Queueing theory gives us a formal framework to analyze our problems instead of guessing. See Simple Scaling.
Closed vs Open Systems
- In an open system, customers arrive from the outside and depart permanently after being served. This is like sub requests hitting a server
- In a closed system, there is a fixed population of N customers and there is always N in the system and we keep cycling these customers. This is like a database connection pool.
We typically require that and assume that . These values are averages so it could happen that we temporarily fall behind a bit and make up for it later on. We do need to make up for fall backs though otherwise we become overloaded.
Lets say time is t, is the number of jobs in the system at time t, represents arrival by time t, and represents departures by time t.
If arrivals exceed departures also goes to infinity.
We typically want to raise . This might not necessarily improve X since there might just not be work to do. This would just increase the maximum potential.
Note that in a closed system, we are always at capacity since there is always more work to do. As soon as one finishes, another enters the queue. In this case, is the controlling factor. The throughput is exactly the service rate.
It is advantageous to know, given the arrival rate , what is the mean job size . The open system strategy is to ramp up , Once the completion rate levels off, we hit a limit and have . For the closed system rate, we keep the system totally busy in the stress test and we can directly measure jobs completing per second, giving us directly.
Problems when Queues Get Long
- Balking: Don’t get in line at all
- Reneging: Get in line, then leave
- Loss: No capacity, request to enqueue refused ( system) Customers estimate wait time and compare it against their willingness to wait
Note
What if people can queue ahead?
Priorities
Priorities for queueing may exist.
- How much if any, does giving priority to one group over another help the group being given priority?
- How much, if any, does giving priority to one group disadvantage the group not being given priority
- Does this incentivize people to choose less popular things
- If everyone has priority, no one does
Complexity
Queues become complex when we consider agents that operate independently.
- Different arrival and departure times
- Different willingnesses to wait
- Requests are time bounded
- Rate limiting
- Different services that have independent service rates
- Services have fixed maximum capacities
Simulation Configuration
We typically need to simulate results for queues because of complexity, we can configure:
User Types
- How long users stay
- When they balk
- What they want to do
Systems
- No priority
- Priority
- Better priority? (Virtual queueing where you can do other stuff)
Summary
Knowledge leads to a huge inequality. Complexity allows for exploitation.